Add `tupdate` to `Tensor` class and start simplifying `tscatter` #101

Mikolaj · 2023-04-16T08:46:12Z

It should be such that tupdate (tzero sh) ix v is the transpose of tindex v ix. Also

horde-ad/simplified/HordeAd/Core/AstSimplify.hs

Line 433 in 6f88617

-- astScatter sh v (Z, ix) = update (tzero sh 0) ix v

Probably tscatter can then be simplified using tupdate similarly as tgather simplifies using tindex right now. I'm not sure how much of the current complex tgather simplification code would dualize, but at least the trivial cases should do and they offer great benefits whenever they apply.

I suppose, we'd also need an Ast term for the operation, vectorization rules and forward pass and transpose rules. A similar operation is already implemented at the low level, because it's needed too implement scatter:

horde-ad/src/common/HordeAd/Internal/TensorOps.hs

Lines 101 to 117 in 6f88617

    
           -- TODO: try to weave a similar magic as in tindex0R 
        
           -- TODO: for the non-singleton case see 
        
           -- https://github.com/Mikolaj/horde-ad/pull/81#discussion_r1096532164 
        
           updateNR :: forall m n a. (Numeric a, KnownNat m, KnownNat n) 
        
                    => OR.Array (m + n) a -> [(IndexInt m, OR.Array n a)] 
        
                    -> OR.Array (m + n) a 
        
           updateNR arr upd = 
        
             let Data.Array.Internal.RankedS.A 
        
                   (Data.Array.Internal.RankedG.A shRaw 
        
                      Data.Array.Internal.T{offset, values}) = OR.normalize arr 
        
                 !_A = assert (offset == 0) () 
        
             in let sh = listShapeToShape shRaw 
        
                    f t (ix, u) = 
        
                      let v = OR.toVector u 
        
                          i = fromIntegral $ toLinearIdx @m @n sh ix 
        
                      in LA.vjoin [V.take i t, v, V.drop (i + V.length v) t] 
        
                in OR.fromVector shRaw (foldl' f values upd)

This needs to be generalized to non-singleton indexes but, OTOH, it can be specialized to just one update, at least initially.

Overall, this ticket is a big chunk of work, but quite modular. A couple of parts, but probably intertwined with others, are crucial for performance of the simplified horde-ad.

The text was updated successfully, but these errors were encountered:

tomsmeding · 2023-04-18T09:23:14Z

What would be the type of this new tupdate?

Mikolaj · 2023-04-18T09:40:16Z

I think, the simplest one that agrees with

horde-ad/simplified/HordeAd/Core/AstSimplify.hs

Line 433 in 6f88617

-- astScatter sh v (Z, ix) = update (tzero sh 0) ix v

which is

tupdate ::  TensorOf (p + n) r -> IndexOf p r -> TensorOf n r -> TensorOf (p + n) r

which checks out with the type of transpose of update (tzero sh 0) ix v, which is tindex v ix

tindex :: TensorOf (p + n) r -> IndexOf p r -> TensorOf n r

tomsmeding · 2023-04-18T09:55:11Z

Wouldn't then tupdate base idx item necessarily copy (almost) the entirety of base? This is basically the one-hot encoding for the transposition of indexing, slightly modified to compute base + onehot i instead of just onehot i. I struggle to see how this will ever be remotely efficient if you're doing more than 1 indexing operation on an array; surely you want to batch them up into a single scatter?

Mikolaj · 2023-04-18T10:14:41Z

The motivating example

let x11 = tscatter [1] (tfromList [tsum (x3 * x9)])
                       (\[i10] -> [0])
  in x11 ! [0]

has nothing interesting to batch in a single scatter. Similarly, a transpose of indexing has just one one-hot, not a collection of them. I guess, a general rule for indexing of tupdate would permit us to perform the indexing from the motivating example early and not materialize any of the large tensors. In other cases, we can interpret/compile sequential tupdates jointly. We can think of the associative accumulators.

Even if we end up batching many things up in a single scatter, we have to represent them somehow while they are sprinkled in many places of the generated code. I'm guessing trivial cases of scatter may not be the best way. Then we can transform the code to get these things together and then, eventually, batch them up.

tomsmeding · 2023-04-18T11:16:44Z

Even if we end up batching many things up in a single scatter, we have to represent them somehow while they are sprinkled in many places of the generated code. I'm guessing trivial cases of scatter may not be the best way. Then we can transform the code to get these things together and then, eventually, batch them up.

Ah, I see, you want an easier-to-recognise representation for trivial scatters. Because I feel that your given trivial scatter won't really be much slower than the corresponding tupdate, simply because all the overhead is in the copying of the base tensor. But if your point with tupdate is not performance but recognisability and hence easier recombination later in an efficient single scatter, then yes that makes sense.

Though I wonder if it's necessary. Maybe we can find a way to combine (vectorise, essentially) more general forms of tscatter in a way that is not too hard to implement and subsumes the cases where tupdate would be useful.

But that depends on how they appear in the code to simplify, which in turn depends on how the indexing operations appear in the original program. If they appear easily batchable there already, then the problem doesn't even arise because the things are immediately vectorised to a gather anyway. Do you happen to have a motivating example here?

Mikolaj · 2023-04-18T11:34:44Z

Ah, I see, you want an easier-to-recognise representation for trivial scatters.

Yes, that's the main point.

Because I feel that your given trivial scatter won't really be much slower than the corresponding tupdate, simply because all the overhead is in the copying of the base tensor.

Sure, but if I have a rule

tupdate u v ix ! ix --> v

then this is faster than leaving the scatter be, materializing it and then projecting. But, again, the rule can be just as well written for scatter, not tupdate, so it's mostly about presentation.

Though I wonder if it's necessary. Maybe we can find a way to combine (vectorise, essentially) more general forms of tscatter in a way that is not too hard to implement and subsumes the cases where tupdate would be useful.

That would be great.

But that depends on how they appear in the code to simplify, which in turn depends on how the indexing operations appear in the original program. If they appear easily batchable there already, then the problem doesn't even arise because the things are immediately vectorised to a gather anyway. Do you happen to have a motivating example here?

Not really. But once we construct tscatter in whatever smart way, we'd want to fuse tscatter and simplify it in other ways. What I have are, somewhat tangentially, the corresponding rules for tgather, e.g.,

horde-ad/simplified/HordeAd/Core/AstSimplify.hs

Lines 626 to 645 in 35ea918

    
           (k :$ sh', (var ::: vars, i1 :. rest1)) -> 
        
             if | not (any (`intVarInAstInt` i1) vars0) -> 
        
                  astGatherZOrStepOnly stepOnly sh0 (astIndex v0 (i1 :. ZI)) 
        
                                       (vars0, rest1) 
        
                | case iN of 
        
                    AstIntVar varN' -> 
        
                      varN' == varN 
        
                      && not (any (varN `intVarInAstInt`) restN) 
        
                      && case ( dropShape @(m - 1) sh0 
        
                              , dropShape @(p - 1) (shapeAst v0) ) of 
        
                        (kN :$ _, vkN :$ _) -> kN == vkN 
        
                        _ -> error "impossible pattern needlessly required" 
        
                    _ -> False 
        
                  -> astGatherZOrStepOnly stepOnly sh0 v0 (varsN, restN) 
        
                | intVarInIndex var ix0 -> 
        
                  astGatherCase sh0 v0 (vars0, ix0) 
        
                | any (`intVarInIndex` ix0) vars -> 
        
                  astKonst k (astGatherZOrStepOnly stepOnly sh' v0 (vars, ix0)) 
        
                | otherwise -> 
        
                  astKonstN sh0 (astIndex v0 ix0)

that simplify tgather a lot and use indexing (astIndex). I can't write such rules for tscatter, because I don't have tupdate (and using tgather instead of tindex and tscatter instead of tupdate would quickly lead to insanity).

Mikolaj · 2023-04-25T10:28:30Z

This is killing my CI, so I will have to at least add the update term so that it takes less memory than the special case of scatter. Then I'd either start simplifying indexing of update or fuse many update into one. That's still very ad-hoc and much easier than general dualising the simplification and fusion of gather, if it's possible at all.

Mikolaj · 2023-04-25T23:10:23Z

Eventually I simplified the scatters that are the transpose of indexing and I also started simplifying some special forms of scatters. This helped with tests speed, but not nearly enough. All without introducing tupdate yet, which would probably just be tupdate (c, ix) = AstScatter sh c (Z, ix) (which seems to be precisely dual to indexing both when transposing and when comparing scatter and gather simplification rules) or tupdate t (c, ix) = t + AstScatter sh c (Z, ix) (which may or may not fuse better in some cases). Other variants seem to have problems when getting vectorized.

All in all, scatter can certainly be fused with other scatters and can be simplified a bit more, but I'm no longer certain we can just reverse arrows in the gather simplification code. Reversing arrows seems tricky.

Mikolaj added the help wanted Extra attention is needed label Apr 16, 2023

Mikolaj mentioned this issue Apr 16, 2023

Add a sharing mechanism to Ast terms #95

Open

Mikolaj changed the title ~~Add tupdatet to Tensor class and start simplifying tscatter`~~ Add tupdate' to Tensor class and start simplifying tscatter` Apr 16, 2023

Mikolaj changed the title ~~Add tupdate' to Tensor class and start simplifying tscatter`~~ Add tupdate'to Tensor class and start simplifying tscatter Apr 16, 2023

Mikolaj changed the title ~~Add tupdate'to Tensor class and start simplifying tscatter~~ Add tupdate to Tensor class and start simplifying tscatter Apr 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `tupdate` to `Tensor` class and start simplifying `tscatter` #101

Add `tupdate` to `Tensor` class and start simplifying `tscatter` #101

Mikolaj commented Apr 16, 2023

tomsmeding commented Apr 18, 2023

Mikolaj commented Apr 18, 2023

tomsmeding commented Apr 18, 2023

Mikolaj commented Apr 18, 2023

tomsmeding commented Apr 18, 2023

Mikolaj commented Apr 18, 2023 •

edited

Loading

Mikolaj commented Apr 25, 2023

Mikolaj commented Apr 25, 2023

Add tupdate to Tensor class and start simplifying tscatter #101

Add tupdate to Tensor class and start simplifying tscatter #101

Comments

Mikolaj commented Apr 16, 2023

tomsmeding commented Apr 18, 2023

Mikolaj commented Apr 18, 2023

tomsmeding commented Apr 18, 2023

Mikolaj commented Apr 18, 2023

tomsmeding commented Apr 18, 2023

Mikolaj commented Apr 18, 2023 • edited Loading

Mikolaj commented Apr 25, 2023

Mikolaj commented Apr 25, 2023

Add `tupdate` to `Tensor` class and start simplifying `tscatter` #101

Add `tupdate` to `Tensor` class and start simplifying `tscatter` #101

Mikolaj commented Apr 18, 2023 •

edited

Loading