Feature request: allow lambdas in kernels when they can be evaluated at compile time #463

lostmsu · 2021-04-16T07:18:03Z

Rationale

This request is syntax sugar for creating C# classes, that provide some GPGPU capabilities.

Imagine you are trying to implement a ISqlCalc, that needs to be able to perform a few ops on arrays using ILGPU.

interface ISqlCalc {
  int[] Neg(int[] a);
  int[] BitwiseComplement(int[] a);
}

class GpuSqlCalc: ISqlCalc {
  static void UnaryOpKernel(Index1 i, ArrayView<int> data, Func<int, int> op)
    => data[i] = op(data[i]);

  static Action<int[]> UnaryOp(Func<int, int> op) {
    return accelerator.LoadAutoGroupedStreamKernel<
                Index1,
                ArrayView<int>
                >((i, d) => GenericUnaryOp(i, d, op));
  }

  public int[] Neg(int[] v) => UnaryOp(v => -v);
  public int[] BitwiseComplement(int[] v) => UnaryOp(v => ~v);
}

Point is it should be possible to inline v => -v. The delegate instance will have MethodInfo pointing to a body, and that method will never reference this, so it is essentially static.

Workaround

Currently the best way to have something analogous to UnaryOpKernel shared for all unary ops I came up with is to use generic monomorphization like this:

interface IUnaryOp<T> { T Apply(T val); }

static void UnaryOpKernel<TOp>(Index1 i, ArrayView<int> data)
  where TOp: struct, // this fails with a class, but really should not in this particular scenario
             IUnaryOp<int>
{
  return data[i] = default(TOp).Apply(data[i]);
}

struct Neg: IUnaryOp<int> { int Apply(int val) => -val; }

accelerator.LoadAutoGroupedStreamKernel<
                Index1,
                ArrayView<int>
                >(UnaryOpKernel<Neg>)

While this works, it is ugly and unnecessarily wordy.

The struct restriction also prevents me from at least doing

class Neg: BaseOp, IUnaryOp<int> {
  ... overrides of BaseOp stuff, that call into UnaryOpKernel<Neg> ...
  
  public int Apply(int val) => -val;
}

This fails due to "Class type 'Neg' is not supported" even though this is never used and Apply is essentially static.

The text was updated successfully, but these errors were encountered:

lostmsu · 2021-04-17T02:54:52Z

Hm, I started working on this, and I am seeing existing pieces of code that look very relevant: MethodExtensions.IsNotCapturingLambda.

@MoFtZ MethodExtensions.GetParameterOffset seems to be returning wrong value for a simple class with no fields or properties. What was the reasoning for it to return 0 for lambdas? AFAIK lambdas are implemented as instance methods on a hidden class, so it should have returned 1.

m4rs-mt · 2021-04-17T10:09:48Z

@lostmsu Thank you for your feature request. We have already discussed the feature in our weekly talk-to-dev sessions. We currently believe that we should add support for lambdas via ILGPU's dynamic specialization features. Also, we can translate calls to lambda functions into calls to "opaque" functions annotated with specific attributes. This avoids inlining and modifying these stubs that we generate.

However, adding support for arbitrary lambdas also requires special care in capturing values and returning lambda closures within kernel functions. Moreover, we can add this feature to the v1.1 feature list 🚀

lostmsu · 2021-04-17T18:03:55Z

@m4rs-mt thanks for the promising response. Is there anyone already working on that feature?

I started my own take at implementing it by replacing the key type in this dictionary:

ILGPU/Src/ILGPU/Frontend/ILFrontend.cs

Line 455 in 93b6551

Dictionary<MethodBase, CompilationStackLocation> detectedMethods,

to a composite of MethodBase + Value?[] array of arguments whose values are known at compile time (in this case a delegate pointing to a known method). This approach does not seem to align with the idea of "dynamic specialization features". Should I pause it?

MoFtZ · 2021-04-18T22:59:34Z

@lostmsu
Thanks for looking into this topic.

Yes, you are correct that lambdas are implemented as instance methods on a hidden class. Originally, ILGPU only supported static methods, which do not have a this pointer. When adding support for non-capturing lambdas, we are removing the this pointer from the lambda and treating it like a static method. This means that arguments are shifted, and the parameter offset is 0, the same as for a static method.

If you find that it is easier to make your changes if the parameter offset is 1, then it is fine to change.

MoFtZ · 2021-04-19T01:31:16Z

@m4rs-mt thanks for the promising response. Is there anyone already working on that feature?

I started my own take at implementing it by replacing the key type in this dictionary:

ILGPU/Src/ILGPU/Frontend/ILFrontend.cs

Line 455 in 93b6551

Dictionary<MethodBase, CompilationStackLocation> detectedMethods,

to a composite of MethodBase + Value?[] array of arguments whose values are known at compile time (in this case a delegate pointing to a known method). This approach does not seem to align with the idea of "dynamic specialization features". Should I pause it?

@lostmsu There is no one currently working on this feature, so if you have the time and passion, we would wholeheartedly welcome your contributions.

We have previously discussed how to support lambda functions to provide the functionality requested. In your example, you have supplied the lambda function as a method parameter to UnaryOp, which then calls LoadAutoGroupedStreamKernel using a lambda function that captures Func<int, int> op. This is related, but different, to #415 which uses a static member variable as the technique for supplying the lambda function.

Regarding "dynamic specialization features", I believe @m4rs-mt is referring to a technique similar to SpecializedValue in ILGPU: https://github.com/m4rs-mt/ILGPU/wiki/Dynamically-Specialized-Kernels
The idea is that calling LoadXxxKernel does an initial compilation of the kernel. Then, when actually launching a kernel that uses SpecializedValue, a further compilation phase is performed that will "dynamically specialize" the kernel. With regards to lambda functions, it could be something like having SpecializedFunc (or more generically, SpecializedDelegate) as a kernel parameter.

Note that this is still an open-ended discussion. For example, should we support lambdas that are static member variables like #415? Is dynamic specialization the correct approach for how it will be used? Should capturing lambdas be supported? And if so, to what extent? Also note that is is not necessary to solve all these questions now - we can slowly build some functionality while deferring other more "problematic" functionality, like capturing lambdas.

lostmsu · 2021-04-19T06:39:34Z

@MoFtZ the problem I see with the LoadXxxKernel followed by its launch with a SpecializedValue is that the original kernel would need to support non-specialized lambda values, and I currently do not see how they could be compiled: their usage involves IL opcode ldftn, and eventually boils down to an indirect function call, which AFAIK (I am not expert on GPGPU) is only available in very recent hardware.

That was my reasoning behind the idea to propagate lambda at the initial compile time.

m4rs-mt · 2021-04-19T06:59:50Z

@lostmsu @lostmsu I don't think we'll run into any problems with respect to the ldftn opcode when translating it into an IR function call to an opaque function. Consequently, we can resolve the call target at kernel launch time by providing a function to the kernel and leaving the specialization work to the ILGPU compiler. However, this generally does not cover all use cases 😄

@lostmsu Regarding your suggestion and implementation: I have experimented with different ways to implement lambdas in the compiler, as they involve handling class types inside kernels. I still believe that mapping these OpCodes to partial function calls + dynamic specialization of the call sites might be the best way to implement them. Anyway, we are always open to PRs that add new features 🤓👍

I was wondering about changing the mapping

to a composite of MethodBase + Value?[] array of arguments whose values are known at compile time (in this case a delegate pointing to a known method). This approach does not seem to align with the idea of "dynamic specialization features". Should I pause it?

to a tuple of a MethodBase and a Value array. Is the value array intended to represent captured variables from the environment of the function? And where do these values come from? Are they created by the IRBuilder from .Net values? If yes, how do we compare them "properly" for equality? I ask about equality checking because primitive constants are instantiated multiple times and are not treated as the same value in the compiler for efficiency. In other words, the integer constant 1 and another constant 1 will not be the same value in memory.

see m4rs-mt#463

lostmsu · 2021-05-21T04:25:27Z

Sorry for a delay here @MoFtZ @m4rs-mt . Have you guys given any thought to this? Do you have notes?

I checked out current code, that handles SpecializedValue, and as-is it seems to be tailored to the scenarios where the value being specialized is already one of the supported values (which delegate instances are not). It might be possible to rework it a bit to get identical behavior, but disallow running generic kernels, that have unspecialized parameters of reference types. Or just explicitly add a different GenericValue<T>, which behaves exactly like SpecializedValue<T>, but must always be specialized.

@m4rs-mt mentioned dynamic specialization. Can you elaborate on the idea? Is it different from the above?

I have not looked at it, but if ILGPU already has cross-function constant propagation that might be another way to approach the problem.

MoFtZ · 2021-05-24T02:16:43Z

@lostmsu We have not defined a preferred API, so you are welcome to design it as you see fit.

I believe that "dynamic specialization" is referring to the concept used by SpecializedValue<T>. That is, when the kernel is launched, it will be provided with the delegate as a parameter. This delegate will then be integrated into the final kernel that runs on the GPU.

lostmsu · 2021-06-21T18:03:58Z

@MoFtZ @m4rs-mt is there some architectural description of ILGPU? I find it hard to wrap my head around existing translation phases, values, and IR without one.

MoFtZ · 2021-06-22T00:43:24Z

There is no such documentation at the moment. If you'd like to join us on Discord, we will try to answer any questions you have:
https://discord.com/invite/X6RBCff

At a very high level, ILGPU follows a typical compiler design, with a Frontend that decodes MSIL into an Intermediate Representation (IR):
https://github.com/m4rs-mt/ILGPU/blob/v1.0-beta1/Src/ILGPU/Frontend/DisassemblerDriver.cs
https://github.com/m4rs-mt/ILGPU/blob/v1.0-beta1/Src/ILGPU/Frontend/ILFrontend.cs#L473
https://github.com/m4rs-mt/ILGPU/blob/v1.0-beta1/Src/ILGPU/Frontend/CodeGenerator/Driver.cs

Several optimisation phases are performed on this IR:
https://github.com/m4rs-mt/ILGPU/blob/v1.0-beta1/Src/ILGPU/IR/Transformations/Optimizer.cs

And finally, the IR is transformed using the Backends, to target Cuda or OpenCL:
https://github.com/m4rs-mt/ILGPU/blob/v1.0-beta1/Src/ILGPU/Backends/CodeGeneratorBackend.cs#L72

Additional resources:
https://www.tutorialspoint.com/compiler_design/index.htm
https://en.wikipedia.org/wiki/Static_single_assignment_form

lostmsu · 2023-02-03T00:42:21Z

This now might be easier with new C# static abstract interface members. Relevant IL changes: https://github.com/dotnet/runtime/pull/49558/files

MoFtZ · 2023-02-03T01:06:09Z

@lostmsu We recently added support for Generic Math, which makes use of Static Abstract Interface members. If you would like to try it out, it is available in a preview release of ILGPU.

Darelbi · 2024-03-27T17:35:12Z

I need exactly that, assume I have a dynamic composition of different algorithms ( NeuraSharp). Also something like that would be usefull:

Declare the interfaces with static methods:

public interface IAlgorithm1
{
public static abstract void DoAlgorithm(float[] input, float[] ouput) ;
}

public interface IFunction1
{
public static abstract float DoSum(float[] input);
}
And then implement them:

public class MyAlgorithm1 : IAlgorithm1 where T : IFunction1
{
public static void DoAlgorithm(float[] input, float[] output)
{
for(int j=0;j<output.Length;j++)
{
output[j] = 2.0f* T.DoSum(input); // call to the static method of the generic type
}
}
}

public class NormalSum1 : IFunction1
{
public static float DoSum(float[] input)
{
float sum = 0.0f;
for (int i = 0; i < input.Length; i++)
sum += input[i];
return sum;
}
}

// load this as kernel
MyAlgorithm1.DoAlgorithm;

Actually I'm lookin at how to generate automatically inlined IL code but is a daunting task, if the feature is already there that would be great...

What kinda of syntax is exactly supported in the preview just out of curiosity?

MoFtZ · 2024-03-27T18:25:15Z

hi @Darelbi.

This is a long-running thread, so the information is outdated.

Currently, using lambdas within a kernel is still not supported.

On the plus side, Generic Math and Static Abstract Interface Member support (for net.70 onwards) is no longer in preview, and is available in the latest version of ILGPU - currently v1.5.1.

There is also some sample code that might meet your requirements for using interfaces:
https://github.com/m4rs-mt/ILGPU/blob/master/Samples/StaticAbstractInterfaceMembers/Program.cs

En3Tho · 2024-04-17T15:42:00Z

Generic math works really well! Here is a small snippet in F# if you're interested.

module ILGpu.GenericKernels

open System
open System.Numerics
open ILGPU
open ILGPU.Runtime
open En3Tho.FSharp.Extensions

// define a set of constraints, INumber + ILGpu default ones
type Number<'TNumber
    when 'TNumber: unmanaged
    and 'TNumber: struct
    and 'TNumber: (new: unit -> 'TNumber)
    and 'TNumber :> ValueType
    and 'TNumber :> INumber<'TNumber>> = 'TNumber

module Kernels =

    // use this constraint for generic parameter in the kernel
    let inline executeSomeNumericOperations<'TNumber when Number<'TNumber>> (index: Index1D) (input: ArrayView<'TNumber>) (output: ArrayView<'TNumber>) (scalar: 'TNumber) =
        if index.X < input.Length.i32 then
            output[index] <- (input[index] * scalar + scalar) / scalar - scalar

let runKernel<'T when Number<'T>> (accelerator: Accelerator) scalar (data: 'T[]) =
    use deviceData = accelerator.Allocate1D(data)
    let kernel = accelerator.LoadAutoGroupedStreamKernel(Kernels.executeSomeNumericOperations<'T>)

    kernel.Invoke(Index1D(deviceData.Length.i32), deviceData.View, deviceData.View, scalar)
    deviceData.CopyToCPU(accelerator.DefaultStream, data)

    data |> Array.iteri ^ fun index element -> Console.WriteLine($"{index} = {element}")

let genericMap() =
    use context = Context.CreateDefault()
    let device = context.Devices |> Seq.find ^ fun x -> x.Name.Contains("GTX 1070")
    use accelerator = device.CreateAccelerator(context)

    // run with ints
    runKernel accelerator 10 [| 0; 1; 2; 3; 4; 5; 6; 7; 8; 9; |]
    // and with floats
    runKernel accelerator 10.1f [| 0.1f; 1.1f; 2.1f; 3.1f; 4.1f; 5.1f; 6.1f; 7.1f; 8.1f; 9.1f; |]

m4rs-mt added difficulty:advanced A task that requires advanced knowledge feature A new feature (or feature request) labels Apr 17, 2021

m4rs-mt added this to the v1.01 milestone Apr 17, 2021

m4rs-mt mentioned this issue Apr 17, 2021

Feature request: add support of runtime compiled parts of kernel #415

Open

m4rs-mt added the difficulty:intermediate A task with intermediate difficulty label Apr 17, 2021

lostmsu added a commit to losttech/ILGPU that referenced this issue Apr 19, 2021

lambdas via compile-time specialization

0573f1e

see m4rs-mt#463

PC-Crashes pushed a commit to PC-Crashes/ILGPU that referenced this issue May 21, 2021

lambdas via compile-time specialization

e27a68e

see m4rs-mt#463

PC-Crashes pushed a commit to PC-Crashes/ILGPU that referenced this issue May 21, 2021

lambdas via compile-time specialization

dfa43d9

see m4rs-mt#463

PC-Crashes pushed a commit to PC-Crashes/ILGPU that referenced this issue May 21, 2021

lambdas via compile-time specialization

076f147

see m4rs-mt#463

m4rs-mt mentioned this issue Jul 18, 2021

ImmutableView support #542

Open

m4rs-mt modified the milestones: v1.1, v1.X Mar 4, 2022

m4rs-mt modified the milestones: v1.X, vX.X (Future) Jun 9, 2022

En3Tho mentioned this issue Apr 18, 2024

Using multiple operators in INumber-constrained generic functions leads to compiler error dotnet/fsharp#17062

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: allow lambdas in kernels when they can be evaluated at compile time #463

Feature request: allow lambdas in kernels when they can be evaluated at compile time #463

lostmsu commented Apr 16, 2021 •

edited

Loading

lostmsu commented Apr 17, 2021

m4rs-mt commented Apr 17, 2021

lostmsu commented Apr 17, 2021 •

edited

Loading

MoFtZ commented Apr 18, 2021

MoFtZ commented Apr 19, 2021

lostmsu commented Apr 19, 2021 •

edited

Loading

m4rs-mt commented Apr 19, 2021 •

edited

Loading

lostmsu commented May 21, 2021

MoFtZ commented May 24, 2021

lostmsu commented Jun 21, 2021

MoFtZ commented Jun 22, 2021

lostmsu commented Feb 3, 2023 •

edited

Loading

MoFtZ commented Feb 3, 2023

Darelbi commented Mar 27, 2024

MoFtZ commented Mar 27, 2024

En3Tho commented Apr 17, 2024 •

edited

Loading

Feature request: allow lambdas in kernels when they can be evaluated at compile time #463

Feature request: allow lambdas in kernels when they can be evaluated at compile time #463

Comments

lostmsu commented Apr 16, 2021 • edited Loading

Rationale

Workaround

lostmsu commented Apr 17, 2021

m4rs-mt commented Apr 17, 2021

lostmsu commented Apr 17, 2021 • edited Loading

MoFtZ commented Apr 18, 2021

MoFtZ commented Apr 19, 2021

lostmsu commented Apr 19, 2021 • edited Loading

m4rs-mt commented Apr 19, 2021 • edited Loading

lostmsu commented May 21, 2021

MoFtZ commented May 24, 2021

lostmsu commented Jun 21, 2021

MoFtZ commented Jun 22, 2021

lostmsu commented Feb 3, 2023 • edited Loading

MoFtZ commented Feb 3, 2023

Darelbi commented Mar 27, 2024

MoFtZ commented Mar 27, 2024

En3Tho commented Apr 17, 2024 • edited Loading

lostmsu commented Apr 16, 2021 •

edited

Loading

lostmsu commented Apr 17, 2021 •

edited

Loading

lostmsu commented Apr 19, 2021 •

edited

Loading

m4rs-mt commented Apr 19, 2021 •

edited

Loading

lostmsu commented Feb 3, 2023 •

edited

Loading

En3Tho commented Apr 17, 2024 •

edited

Loading