ggml: add ggml_can_fuse_subgraph #16662

am17an · 2025-10-19T06:43:30Z

This PR adds ggml_can_fuse_subgraph which is a less strict extension of ggml_can_fuse. It checks given inputs/outputs of a subgraph, whether all the intermediate tensors can be fused. Putting as draft to iterate on the correct API

jeffbolznv · 2025-10-19T19:44:22Z

While this does check that the internal nodes aren't used outside of the fusion region, we also need to check that the internal connectivity of the graph is what we expect. IMO this necessarily has to be verbose, because we have to check that all the node->src[] values are what we expect them to be.

The first idea that comes to mind would be to pass in a list of triples, where each triple is a { dst_node, src_idx, src_node }, and verify that nodes[start + dst_node]->src[src_idx] == nodes[start + src_node].

am17an · 2025-10-20T02:24:37Z

While this does check that the internal nodes aren't used outside of the fusion region, we also need to check that the internal connectivity of the graph is what we expect. IMO this necessarily has to be verbose, because we have to check that all the node->src[] values are what we expect them to be.

I think this is better done at the caller site, which has more context about the fusion. This is just to avoid cases where the write is needed elsewhere we end up fusing it and have a common function for other sanity checks. Maybe a better a name for this function should be is_fusion_candidate. Passing triples like you mentioned is equivalent to building the graph at the caller site, but just not doing equality checks.

ggml/src/ggml.c

jeffbolznv · 2025-10-20T03:39:57Z

I think this is better done at the caller site

It's probably fine to have it as a separate function, but IMO it can still be common code (as much as any of this can be common code - there will always be special cases we want to handle differently).

2. add check for views: view_src should be part of the subgraph

am17an · 2025-10-20T14:54:00Z

ggml/src/ggml.c

+
+        // if node is a view, check if the view src is within the subgraph
+        if (node->view_src) {
+            const struct ggml_tensor * view_src = node->view_src;


Maybe need to walk the tree till we get the non-view parent instead of what I did here

I guess the most conservative thing would be to check all parents. It seems plausible we'll want to fuse a view of a view in the future.

I thought this logic guarantees that view_src is always the "root" source:

llama.cpp/ggml/src/ggml.c

Lines 1630 to 1646 in ad6af8f

static struct ggml_tensor * ggml_new_tensor_impl(

struct ggml_context * ctx,

enum ggml_type type,

int n_dims,

const int64_t * ne,

struct ggml_tensor * view_src,

size_t view_offs) {

GGML_ASSERT(type >= 0 && type < GGML_TYPE_COUNT);

GGML_ASSERT(n_dims >= 1 && n_dims <= GGML_MAX_DIMS);

// find the base tensor and absolute offset

if (view_src != NULL && view_src->view_src != NULL) {

view_offs += view_src->view_offs;

view_src = view_src->view_src;

}

So I don't think we can ever get node->view_src->view_src != NULL. But I guess it does not hurt to have this traversal.

ggml/src/ggml.c

jeffbolznv · 2025-10-20T15:24:17Z

ggml/src/ggml.c

+
+        // if node is a view, check if the view src is within the subgraph
+        if (node->view_src) {
+            const struct ggml_tensor * view_src = node->view_src;


I guess the most conservative thing would be to check all parents. It seems plausible we'll want to fuse a view of a view in the future.

- check all view_src parents - other minor review comments

jeffbolznv · 2025-10-21T00:52:54Z

I implemented logic to check the graph edges in master...jeffbolznv:llama.cpp:pull_16662_ps4. We can do this as a separate change, after these other things land.

ggml/src/ggml.c

am17an · 2025-10-21T07:39:55Z

Will merge once CI passes

* ggml: add ggml_can_fuse_subgraph * ggml-cuda: use ggml_can_fuse_subgraph for topk-moe * format * 1. remove inputs from signature as they are transient nodes 2. add check for views: view_src should be part of the subgraph * - combine check into one loop - check all view_src parents - other minor review comments * remove redudant if test * - rename and other minor review comments * add assert about count < 32

am17an requested review from ggerganov and slaren as code owners October 19, 2025 06:43

am17an marked this pull request as draft October 19, 2025 06:43

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Oct 19, 2025

am17an changed the title ~~Ggml can fuse subgraph~~ ggml: add ggml_can_fuse_subgraph Oct 19, 2025

am17an requested a review from jeffbolznv October 19, 2025 06:45

am17an added 3 commits October 20, 2025 01:52

ggml: add ggml_can_fuse_subgraph

b8a3661

ggml-cuda: use ggml_can_fuse_subgraph for topk-moe

578d918

format

ba472d1

jeffbolznv reviewed Oct 20, 2025

View reviewed changes

ggml/src/ggml.c Outdated Show resolved Hide resolved

ggml/src/ggml.c Outdated Show resolved Hide resolved

1. remove inputs from signature as they are transient nodes

d853036

2. add check for views: view_src should be part of the subgraph

am17an force-pushed the ggml_can_fuse_subgraph branch from 3059ed3 to d853036 Compare October 20, 2025 14:52

am17an commented Oct 20, 2025

View reviewed changes

jeffbolznv reviewed Oct 20, 2025

View reviewed changes

- combine check into one loop

977a333

- check all view_src parents - other minor review comments

jeffbolznv reviewed Oct 21, 2025

View reviewed changes

ggml/src/ggml.c Outdated Show resolved Hide resolved

jeffbolznv approved these changes Oct 21, 2025

View reviewed changes

remove redudant if test

c1054d5

am17an marked this pull request as ready for review October 21, 2025 01:00

jeffbolznv approved these changes Oct 21, 2025

View reviewed changes

ggerganov approved these changes Oct 21, 2025

View reviewed changes

ggml/src/ggml.c Outdated Show resolved Hide resolved

ggml/src/ggml.c Outdated Show resolved Hide resolved

ggml/src/ggml.c Outdated Show resolved Hide resolved

ggml/src/ggml.c Show resolved Hide resolved

am17an added 2 commits October 21, 2025 14:47

- rename and other minor review comments

3886b5f

add assert about count < 32

f2cdb32

am17an requested review from ggerganov and jeffbolznv October 21, 2025 06:53

am17an merged commit 4926419 into ggml-org:master Oct 21, 2025
125 of 126 checks passed

am17an deleted the ggml_can_fuse_subgraph branch October 21, 2025 08:43

	static struct ggml_tensor * ggml_new_tensor_impl(
	struct ggml_context * ctx,
	enum ggml_type type,
	int n_dims,
	const int64_t * ne,
	struct ggml_tensor * view_src,
	size_t view_offs) {

	GGML_ASSERT(type >= 0 && type < GGML_TYPE_COUNT);
	GGML_ASSERT(n_dims >= 1 && n_dims <= GGML_MAX_DIMS);

	// find the base tensor and absolute offset
	if (view_src != NULL && view_src->view_src != NULL) {
	view_offs += view_src->view_offs;
	view_src = view_src->view_src;
	}

ggml: add ggml_can_fuse_subgraph #16662

ggml: add ggml_can_fuse_subgraph #16662

Uh oh!

Conversation

am17an commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffbolznv commented Oct 19, 2025

Uh oh!

am17an commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeffbolznv commented Oct 20, 2025

Uh oh!

am17an Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

jeffbolznv Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeffbolznv Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

jeffbolznv commented Oct 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

am17an commented Oct 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

am17an commented Oct 19, 2025 •

edited

Loading

am17an commented Oct 20, 2025 •

edited

Loading