Scan to Loop roadmap #1072

mathieupoumeyrolsonos · 2023-05-02T08:55:09Z

Step 0: Preliminaries

cancel chunk support

Step 1: Replace Scan by Loop

support ONNX Loop
simplify (?) scan code by splitting to separate ops
- state management
- scan slice and concat logic
caveat: batch i/o extraction may become harder (recognising a scan input will need some pattern matching)

State management

repurpose new operators introduced for pulsed conv state management
use skip to lock state at its initial value up to the right time

Processing input

X is the full pulse input
DynamicSlice to extract nth hyperplan in tensor
a way to manage "n"
- if the input is a scan, n_max is known
batch input extraction optimisation:
- detect input/loop boundary/DynamicSlice
- => always perform extraction without looking at the actual n value bahaviour, assuming we are going to use all (or most) of them

Processing output

if n is known (scan), easy
- allocate a uninitialized (or zeroed) tensor of the output side
- then MutSlice(full, axis, n, slice) -> full
- see https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/scatter-update
if n unknown
- could work with plain concat (but will copy output at each loop iteration)
- MutSlice could also reallocate the tensor if n > len
other option: hidden tensorseq
- store inside the loop the stack of tensors to concat
- after the loop, access this list to build the "full scan" output
- if we don't want tensor sequence, the transmission between these two ops would have to go through the sate...

Unrolling loops ?

when N is known and small (small pulse...)
useful for LSTM & co on loop-less runtimes

Current choice:

N is implemented manually inside the body, not by the loop
- three major cases:
  1. scan: n takes all consecutive values from 0 to n_max
  2. open loop: n counts up to a dynamic EOL condition
  3. general case: n is a scalar doing any random access (use case unknown)
- we want 1. to be competitive with scan, 2. to be reasonably fast and 3. only needs to work
batch input extraction is done without regarding the scalar value on the dynamic slice
batch output extraction: same logic. we will not check what N is doing, we will just match a "axis-wise" operator followed by a MutSlice (or whatever the name is)

Step 2: flatten subgraphs

main goal: simplify axis analysis and loop input/output batch extraction
how to implement loop control flow ?
- aka: do we want runnable model to use subgraphs ? or conditional jumps in eval order ?
nnef: do we want a loop {} construct in nnef instead of the subgraph ?

mathieupoumeyrolsonos · 2023-05-16T09:09:31Z

Possible next step:

split away scanning logic and index management from scan, materializing them in body. Not optimising.

long term, nnef extension in this spirit of :

graph body_rec(xs) -> ys {

    i = 0
    ys = zeroes[shape(x)]
    c = zeroes[...]

    loop {
        break if i == shape(x)
        x = xs[i];
        ys = assign_slice(ys, i, F(x, c));
        c = C(x, c)
        i = i + 1;
    }

    ys
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scan to Loop roadmap #1072

Scan to Loop roadmap #1072

mathieupoumeyrolsonos commented May 2, 2023 •

edited

Loading

mathieupoumeyrolsonos commented May 16, 2023

Scan to Loop roadmap #1072

Scan to Loop roadmap #1072

Comments

mathieupoumeyrolsonos commented May 2, 2023 • edited Loading

Step 0: Preliminaries

Step 1: Replace Scan by Loop

State management

Processing input

Processing output

Unrolling loops ?

Current choice:

Step 2: flatten subgraphs

mathieupoumeyrolsonos commented May 16, 2023

mathieupoumeyrolsonos commented May 2, 2023 •

edited

Loading