-
Notifications
You must be signed in to change notification settings - Fork 13.4k
ggml: add ggml_can_fuse_subgraph #16662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
While this does check that the internal nodes aren't used outside of the fusion region, we also need to check that the internal connectivity of the graph is what we expect. IMO this necessarily has to be verbose, because we have to check that all the node->src[] values are what we expect them to be. The first idea that comes to mind would be to pass in a list of triples, where each triple is a |
I think this is better done at the caller site, which has more context about the fusion. This is just to avoid cases where the write is needed elsewhere we end up fusing it and have a common function for other sanity checks. Maybe a better a name for this function should be |
It's probably fine to have it as a separate function, but IMO it can still be common code (as much as any of this can be common code - there will always be special cases we want to handle differently). |
2. add check for views: view_src should be part of the subgraph
3059ed3 to
d853036
Compare
ggml/src/ggml.c
Outdated
|
|
||
| // if node is a view, check if the view src is within the subgraph | ||
| if (node->view_src) { | ||
| const struct ggml_tensor * view_src = node->view_src; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe need to walk the tree till we get the non-view parent instead of what I did here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the most conservative thing would be to check all parents. It seems plausible we'll want to fuse a view of a view in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this logic guarantees that view_src is always the "root" source:
Lines 1630 to 1646 in ad6af8f
| static struct ggml_tensor * ggml_new_tensor_impl( | |
| struct ggml_context * ctx, | |
| enum ggml_type type, | |
| int n_dims, | |
| const int64_t * ne, | |
| struct ggml_tensor * view_src, | |
| size_t view_offs) { | |
| GGML_ASSERT(type >= 0 && type < GGML_TYPE_COUNT); | |
| GGML_ASSERT(n_dims >= 1 && n_dims <= GGML_MAX_DIMS); | |
| // find the base tensor and absolute offset | |
| if (view_src != NULL && view_src->view_src != NULL) { | |
| view_offs += view_src->view_offs; | |
| view_src = view_src->view_src; | |
| } | |
So I don't think we can ever get node->view_src->view_src != NULL. But I guess it does not hurt to have this traversal.
ggml/src/ggml.c
Outdated
|
|
||
| // if node is a view, check if the view src is within the subgraph | ||
| if (node->view_src) { | ||
| const struct ggml_tensor * view_src = node->view_src; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the most conservative thing would be to check all parents. It seems plausible we'll want to fuse a view of a view in the future.
- check all view_src parents - other minor review comments
|
I implemented logic to check the graph edges in master...jeffbolznv:llama.cpp:pull_16662_ps4. We can do this as a separate change, after these other things land. |
|
Will merge once CI passes |
* ggml: add ggml_can_fuse_subgraph * ggml-cuda: use ggml_can_fuse_subgraph for topk-moe * format * 1. remove inputs from signature as they are transient nodes 2. add check for views: view_src should be part of the subgraph * - combine check into one loop - check all view_src parents - other minor review comments * remove redudant if test * - rename and other minor review comments * add assert about count < 32
* ggml: add ggml_can_fuse_subgraph * ggml-cuda: use ggml_can_fuse_subgraph for topk-moe * format * 1. remove inputs from signature as they are transient nodes 2. add check for views: view_src should be part of the subgraph * - combine check into one loop - check all view_src parents - other minor review comments * remove redudant if test * - rename and other minor review comments * add assert about count < 32
* ggml: add ggml_can_fuse_subgraph * ggml-cuda: use ggml_can_fuse_subgraph for topk-moe * format * 1. remove inputs from signature as they are transient nodes 2. add check for views: view_src should be part of the subgraph * - combine check into one loop - check all view_src parents - other minor review comments * remove redudant if test * - rename and other minor review comments * add assert about count < 32
This PR adds
ggml_can_fuse_subgraphwhich is a less strict extension ofggml_can_fuse. It checks given inputs/outputs of a subgraph, whether all the intermediate tensors can be fused. Putting as draft to iterate on the correct API