Faster list/array computation expressions #11592

dsyme · 2021-05-21T16:53:35Z

This implements RFC FS-1099 - library support for faster computed list and array expressions and improves the performance of computed list and array expressions

This is drawing from the lessons of RFC FS-1087 and FS-1097 that we should (somewhat obviously) have a struct collector and just generate the synchronous code. This is what the implementation does - it looks for the compiled internal form of [ ... ] and [| ... |] and transforms to synchronous code (which is the original code with yield and yield! replaced by calls to the corresponding collector method).

For historical reasons back to F# 1.0 the compiled internal form of [ ... ] and [| ... |] is like this

`Seq.toList (seq { ...Seq.append/Seq.singleton/Seq.empty/...  })`

`Seq.toArray (seq { ...Seq.append/Seq.singleton/Seq.empty/...  })`

Previously we compiled the seq { ... } into a state machine, but still called Seq.toList and Seq.toArray. This was based
on thinking overly-influenced by LINQ and its focus on IEnumerable for everything. However IEnumerable is a
needless inversion of control and computationally expensive with extra allocations, MoveNext etc., - so
much so that LINQ is routinely avoided by some teams.
Instead, when ultimately producing lists and arrays, we can produce near-optimal synchronous code directly.
we should always have compiled these things in this way...

Notes:

This is an optimization that preserves the same semantics and execution order, so should work for all existing code. It involves a small addition to FSharp.Core documented in the RFC above. The optimization kicks in if the collectors are present in the referenced FSharp.Core.
One particular optimization is that, for lists, a yield! of a list in tailcall position simply stitches that list into the result without copying (AddManyAndClose on ListCollector<T>). This is valid because lists are immutable - and we already do this for List.append for example. In theory this could reduce some O(n) operations to O(1) though I doubt we'll see that in practice.
There is also an optimizations in ArrayCollector to avoid creating a ResizeArray for size 0,1 or 2 arrays. This is obviously a good optimization based on any reasonable model of allocation costs. However it may not be optimal and we can adjust this in the future - the relevant struct fields are internal and can be changed. It would be good to further measure the stack-space/allocation/data-copying tradeoffs and decide if it's worth extending this further.
I went through tests\walkthroughs\DebugStepping\TheBigFileOfDebugStepping.fsx and made some improvements to debugging of list, array and sequence expressions and checked that all the sample list/sequence expressions in that file debug OK. Specifically the location of the debug points associated with try and with and while and finally keywords is now correctly recovered from the internal form.

The perf results on micro samples are good and pretty much as expected from previous experiments with using state machines for list { ... } and array {... } comprehensions. Note that some other people have experimented with faster builders using reflection emit codegen too.

Raw perf of generating 0 or 1 element lists: ~4x faster
Raw perf of generating 6-10 element lists: ~4x faster
Raw perf of generating 0 or 1 element arrays: ~4x faster
Raw perf of generating 6-10 element arrays: ~2x faster

We don't expect any change in fixed size arrays or lists

I don't expect any cases where this will either be slower or use more stack in a signficant way compared to our old way of doing these (which is to create a sequence expression state machine and iterate).

C:\GitHub\dsyme\fsharp>artifacts\bin\fsc\Release\net472\fsc.exe --optimize a.fs && a
PERF: tinyVariableSizeBuiltin : 89
PERF: variableSizeBuiltin : 504
PERF: fixedSizeBase : 504
PERF: tinyVariableSizeBuiltin (array) : 194
PERF: variableSizeBuiltin (array) : 1251
PERF: fixedSizeBase (array) : 260

C:\GitHub\dsyme\fsharp>fsc.exe --optimize a.fs && a
PERF: tinyVariableSizeBuiltin : 356
PERF: variableSizeBuiltin : 1949
PERF: fixedSizeBase : 497
PERF: tinyVariableSizeBuiltin (array) : 717
PERF: variableSizeBuiltin (array) : 2511
PERF: fixedSizeBase (array) : 244

For correctness testing the existing tests we have are ok I think - we have zillions of computed list and array expressions in the test suites and compiler that return results of many different sizes.
Some IL code generation tests will likely fail, we'll need to update those
We need to check debug stepping (it should be possible to make this much improved if it's not alrready)

dsyme · 2021-05-21T17:24:54Z

Here's the code I used for performance testing:

module Lists =

    let tinyVariableSizeBuiltin () = 
        for i in 1 .. 1000000 do
            [
               if i % 3 = 0 then 
                   yield "b"
            ] |> List.length |> ignore

    let variableSizeBuiltin () = 
        for i in 1 .. 1000000 do
            [
               yield "a"
               yield "b"
               yield "b"
               yield "b"
               yield "b"
               if i % 3 = 0 then 
                   yield "b"
                   yield "b"
                   yield "b"
                   yield "b"
               yield "c"
            ] |> List.length |> ignore

    let fixedSizeBase () = 
        for i in 1 .. 1000000 do
            [
               "a"
               "b"
               "b"
               "b"
               "b"
               "b"
               "b"
               "b"
               "b"
               "c"
            ] |> List.length |> ignore

    let perf s f = 
        let t = System.Diagnostics.Stopwatch()
        t.Start()
        for i in 0 .. 5 do 
            f()
        t.Stop()
        printfn "PERF: %s : %d" s t.ElapsedMilliseconds

    perf "tinyVariableSizeBuiltin" tinyVariableSizeBuiltin

    perf "variableSizeBuiltin" variableSizeBuiltin

    perf "fixedSizeBase" fixedSizeBase
module Arrays =

    let tinyVariableSizeBuiltin () = 
        for i in 1 .. 1000000 do
            [|
               if i % 3 = 0 then 
                   yield "b"
            |] |> Array.length |> ignore

    let variableSizeBuiltin () = 
        for i in 1 .. 1000000 do
            [|
               yield "a"
               yield "b"
               yield "b"
               yield "b"
               yield "b"
               if i % 3 = 0 then 
                   yield "b"
                   yield "b"
                   yield "b"
                   yield "b"
               yield "c"
            |] |> Array.length |> ignore

    let fixedSizeBase () = 
        for i in 1 .. 1000000 do
            [|
               "a"
               "b"
               "b"
               "b"
               "b"
               "b"
               "b"
               "b"
               "b"
               "c"
            |] |> Array.length |> ignore

    let perf s f = 
        let t = System.Diagnostics.Stopwatch()
        t.Start()
        for i in 0 .. 5 do 
            f()
        t.Stop()
        printfn "PERF: %s : %d" s t.ElapsedMilliseconds

    perf "tinyVariableSizeBuiltin (array)" tinyVariableSizeBuiltin

    perf "variableSizeBuiltin (array)" variableSizeBuiltin

    perf "fixedSizeBase (array)" fixedSizeBase

dsyme · 2021-05-21T17:27:52Z

src/fsharp/FSharp.Core/seqcore.fs

+            match values with 
+            | :? ('T[]) as valuesAsArray -> 
+                for v in valuesAsArray do
+                   this.Add v


It's possible this could be a bit faster and avoid so many writes into the fields of ListCollector. However ListCollector is ultimately a mutable struct on the stack, so writes will be fast, it may not end up any faster

dsyme · 2021-05-21T17:28:33Z

src/fsharp/FSharp.Core/seqcore.fs

+            // cook a faster iterator for lists and arrays
+            match values with 
+            | :? ('T[]) as valuesAsArray -> 
+                for v in valuesAsArray do


Iterating over arrays is considerably faster than iterating sequences

…, arrays

dsyme · 2021-05-21T22:59:41Z

Random test failure:

2021-05-21T22:37:04.9212825Z   Failed TypeCheckOutOfMemory [96 ms]
2021-05-21T22:37:04.9213392Z   Error Message:
2021-05-21T22:37:04.9214133Z    System.UnauthorizedAccessException : Access to the path 'D:\workspace\_work\1\s\vsintegration\tests\UnitTests\watson-test.fs' is denied.
2021-05-21T22:37:04.9214949Z   Stack Trace:
2021-05-21T22:37:04.9215732Z      at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
2021-05-21T22:37:04.9216431Z    at System.IO.File.InternalDelete(String path, Boolean checkHost)
2021-05-21T22:37:04.9217044Z    at System.IO.File.Delete(String path)
2021-05-21T22:37:04.9218022Z    at Tests.Compiler.Watson.Check.FscLevelException[TException](String simulationCode) in D:\workspace\_work\1\s\vsintegration\tests\UnitTests\Tests.Watson.fs:line 50
2021-05-21T22:37:04.9219314Z    at Tests.Compiler.Watson.WatsonTests.TypeCheckOutOfMemory() in D:\workspace\_work\1\s\vsintegration\tests\UnitTests\Tests.Watson.fs:line 115

dsyme · 2021-05-21T23:18:14Z

I went through tests\walkthroughs\DebugStepping\TheBigFileOfDebugStepping.fsx and made some general improvements to debugging of list, array and sequence expressions and checked that all the sample list/sequence expressions in that file debug OK

dsyme · 2021-05-21T23:56:32Z

This is now ready (some baselines may still need updating, but nearly all are done)

Note that a lot of sequence expression state machine generation is removed from the test baselines in favour of simpler code, hence about -2000 lines net in this PR

dsyme · 2021-05-22T12:45:04Z

OK, everything green, this is ready. I've updated the RFC and notes in the PR

dsyme · 2021-05-25T10:05:59Z

@TIHan @cartermp @KevinRansom @vzarytovskii This is ready for your review

dsyme · 2021-05-26T19:24:47Z

On review with @TIHan:

Mutables are still being promoted to ref in collecting list and array code.
For loops are always using IEnumerable, not "fast integer for loops" or "array loops"

eg.

let f () =
   [| let mutable x = 1  
      if today() then yield x
      if f() then yield 1 
      for i in 0.. 5 do
         if g() then yield 1  
      |]

becomes roughly:

let f () =
   let x = ref 1  
   let mutable collector = ArrayCollector<'T>()
   if today() then collector.Add(x.Value)
   if f() then collector.Add(1)
   let enum = (0..5).GetEnumerator()
   try 
     while enum.MoveNext() do
       if g() then collector.Add(1)
   finally
     enum.Dispose()
   collector.Close()

This is still always faster than the corresponding sequence expression code.

In a separate PR we could improve (2). Improving (1) is more difficult because the mutable-to-ref promotion happens well before the generation of collecting code.

Separately we could also imagine lifting the language restriction that prevents the use of span, byref capturing etc. in list and array expressions - at least for compiled code. Quotations may still have problems with these.

kerams · 2021-05-26T19:40:49Z

Would it be hard to statically analyze (in the optimization/IL gen phase) the sequence expression and learn the minimum number of elements that will be yielded? In the case of ArrayCollector, that information could be used to preallocate a better sized resize array (or indeed the final array when we can be sure of the exact number of elements) and also skip assigning to First and Second at first.

dsyme · 2021-05-26T23:21:16Z

@kerams Yes in theory, though I think the cases where that would matter would be cases where the size projection was a formulae of the input sizes being iterated

TIHan

@dsyme and I spoke on this last week. I quickly went over it today - most of the code changes are test changes. We will have new public APIs in FSharp.Core it looks like; they are meant for the codegen.

Looking at the core of the change, it will be great to have this and might be able to extend it further for other collections, such as ImmutableArray.

dsyme · 2021-06-02T11:27:54Z

@dsyme and I spoke on this last week. I quickly went over it today - most of the code changes are test changes. We will have new public APIs in FSharp.Core it looks like; they are meant for the codegen.

Thanks - I'll merge this.

Looking at the core of the change, it will be great to have this and might be able to extend it further for other collections, such as ImmutableArray.

Yes. I'm not yet sure of the right generalization. In principle any synchronous consumption of a seq { ... } or Seq.map/filter/... pipeline can be given this treatment, e.g. Seq.iter or a for x in seq { ... } do ... though neither are that commonly occuring in combination.

For other collections, e.g. immutable array/block, it may depend on whether we allow block [ ... ] as a special construct, or the proposal to allow [ ... ] to be used to initialize a block in the presence of known type information.

dsyme · 2021-06-02T12:12:19Z

Note also there are many other places inside FSharp.Core we might be able to ArrayCollector to avoid creating a ResizeArray - e.g. even just for Seq.toArray

dsyme changed the base branch from main to feature/tasks May 21, 2021 16:53

Don Syme added 2 commits May 21, 2021 18:20

compiled list/array computations

08b0a8c

compiled list/array computations

ab980b1

dsyme force-pushed the feature/fastlist branch from 019fc1d to ab980b1 Compare May 21, 2021 17:22

dsyme changed the base branch from feature/tasks to main May 21, 2021 17:22

dsyme commented May 21, 2021

View reviewed changes

Don Syme added 8 commits May 21, 2021 18:49

fix build

63debfa

minor cleanup

0e65e0f

fix build

4cb1567

AddManyAndClose, accurate debug points

a21caff

AddManyAndClose fix

15435f3

fix baseline

82a951b

fix sequence points for append+sequential in computed sequences, list…

59ff381

…, arrays

fix build

4731da8

fix test

244ec7d

Don Syme added 2 commits May 22, 2021 00:32

baseline updates

679e11c

fix build

2eec1ea

update baselines

4e0cfc5

remove unused code

0329b62

dsyme closed this May 26, 2021

dsyme reopened this May 26, 2021

dsyme changed the title ~~feature/fastlist - faster list/array computation expressions~~ Faster list/array computation expressions Jun 1, 2021

TIHan approved these changes Jun 2, 2021

View reviewed changes

dsyme merged commit 0cafa21 into main Jun 2, 2021

KevinRansom deleted the feature/fastlist branch June 30, 2021 19:08

This was referenced Jul 19, 2021

Use builders to implement yield expressions fsharp/fslang-suggestions#951

Closed

List, array comprehension implementation #10532

Closed

dsyme mentioned this pull request Oct 13, 2021

Type-directed resolution of [ .. ] syntax fsharp/fslang-suggestions#1086

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster list/array computation expressions #11592

Faster list/array computation expressions #11592

dsyme commented May 21, 2021 •

edited

Loading

dsyme commented May 21, 2021 •

edited

Loading

dsyme May 21, 2021

dsyme May 21, 2021

dsyme commented May 21, 2021

dsyme commented May 21, 2021

dsyme commented May 21, 2021 •

edited

Loading

dsyme commented May 22, 2021 •

edited

Loading

dsyme commented May 25, 2021

dsyme commented May 26, 2021

kerams commented May 26, 2021 •

edited

Loading

dsyme commented May 26, 2021

TIHan left a comment

dsyme commented Jun 2, 2021

dsyme commented Jun 2, 2021

Faster list/array computation expressions #11592

Faster list/array computation expressions #11592

Conversation

dsyme commented May 21, 2021 • edited Loading

dsyme commented May 21, 2021 • edited Loading

dsyme May 21, 2021

Choose a reason for hiding this comment

dsyme May 21, 2021

Choose a reason for hiding this comment

dsyme commented May 21, 2021

dsyme commented May 21, 2021

dsyme commented May 21, 2021 • edited Loading

dsyme commented May 22, 2021 • edited Loading

dsyme commented May 25, 2021

dsyme commented May 26, 2021

kerams commented May 26, 2021 • edited Loading

dsyme commented May 26, 2021

TIHan left a comment

Choose a reason for hiding this comment

dsyme commented Jun 2, 2021

dsyme commented Jun 2, 2021

dsyme commented May 21, 2021 •

edited

Loading

dsyme commented May 21, 2021 •

edited

Loading

dsyme commented May 21, 2021 •

edited

Loading

dsyme commented May 22, 2021 •

edited

Loading

kerams commented May 26, 2021 •

edited

Loading