Skip to content

[RFC] Add a header for PyTorch-like operator overloading syntax #1326

@danielzgtg

Description

@danielzgtg

Coming from the original PyTorch implementations, people are finding it increasingly cumbersome to type ggml_ and ctx over and over again. One line of Python can turn into 20 lines of C++. This is creating too much friction, and we are getting lost in the boilerplate instead of being able to see the big picture.

I would like to create a header that takes advantage of the C++ way of using operator overloading. Eventually, it will include PyTorch and NumPy aliases to allow simply copying-and-pasting code from Python into ggml C++ with only minor fixup.

The new struct will just wrap things like struct ggml_tensor_wrapper { ggml_tensor * data; ggml_context * ctx; }; the goal is to not change the resulting .exe binary. This will be done by making it header-only and declaring everything inline, ultimately still calling the ggml_ series of functions.

Example 1

Before

https://github.com/ggml-org/whisper.cpp/blob/5527454cdb3e15d7e2b8a6e2afcb58cb61651fd2/src/whisper.cpp#L2231-L2259

// feed-forward network
{
    // norm
    {
        cur = ggml_norm(ctx0, inpFF, hparams.eps);

        // cur = mlp_ln_w*cur + mlp_ln_b
        cur = ggml_add(ctx0,
                ggml_mul(ctx0, cur, layer.mlp_ln_w),
                layer.mlp_ln_b);
    }

    // fully connected
    cur = ggml_mul_mat(ctx0,
            layer.mlp_0_w,
            cur);

    cur = ggml_add(ctx0, cur, layer.mlp_0_b);

    // GELU activation
    cur = ggml_gelu(ctx0, cur);

    // projection
    cur = ggml_mul_mat(ctx0,
            layer.mlp_1_w,
            cur);

    cur = ggml_add(ctx0, cur, layer.mlp_1_b);
}

After

// feed-forward network
{
    // norm
    {
        cur = inpFF.norm(hparams.eps);
        cur = cur * layer.mlp_ln_w + layer.mlp_ln_b;
    }

    cur = (layer.mlp_0_w ^ cur) + layer.mlp_0_b; // fully connected
    cur = cur.gelu();
    cur = (layer.mlp_1_w ^ cur) + layer.mlp_1_b; // projection
}

Example 2

Before

https://github.com/ggml-org/whisper.cpp/blob/5527454cdb3e15d7e2b8a6e2afcb58cb61651fd2/src/whisper.cpp#L2748-L2759

struct ggml_tensor * aheads_KQs = ggml_reshape_2d(ctx0, KQ_soft_max, KQ_soft_max->ne[0] * KQ_soft_max->ne[1], KQ_soft_max->ne[2]);
aheads_KQs = ggml_transpose(ctx0, aheads_KQs);
aheads_KQs = ggml_cont(ctx0, aheads_KQs);
aheads_KQs = ggml_mul_mat(ctx0, wstate.aheads_masks.m[il], aheads_KQs);
aheads_KQs = ggml_transpose(ctx0, aheads_KQs);
aheads_KQs = ggml_cont(ctx0, aheads_KQs);
aheads_KQs = ggml_reshape_3d(ctx0, aheads_KQs, KQ_soft_max->ne[0], KQ_soft_max->ne[1], wstate.aheads_masks.m[il]->ne[1]);
if (aheads_cross_QKs == NULL) {
    aheads_cross_QKs = aheads_KQs;
} else {
    aheads_cross_QKs = ggml_concat(ctx0, aheads_cross_QKs, aheads_KQs, 2);
}

After

// typedef ggml_tensor_wrapper gg
// .flatten is from PyTorch
// .T is from numpy. For convenience, tensor.T() = tensor.transpose().cont()
gg aheads_KQs{KQ_soft_max.flatten(0, 1).T()};
aheads_KQs = (wstate.aheads_masks.m[il] ^ aheads_KQs).T();
aheads_KQs = aheads_KQs.reshape(KQ_soft_max->ne[0], KQ_soft_max->ne[1], wstate.aheads_masks.m[il]->ne[1]);
if (aheads_cross_QKs == NULL) {
    aheads_cross_QKs = aheads_KQs;
} else {
    aheads_cross_QKs = aheads_cross_QKs.concat(aheads_KQs, 2);
}

Example 3

I am drowning in boilerplate at https://github.com/mmwillet/TTS.cpp/blob/0b420102d53c16f36ea75e626a3a3d40d7b26a4d/src/kokoro_model.cpp#L1141 .

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions