Skip to content

metal : pair rsets_add with rsets_rm on buffer free (fix #22593)#22595

Open
joelteply wants to merge 1 commit into
ggml-org:masterfrom
joelteply:metal-fix-rsets-leak-on-buffer-free
Open

metal : pair rsets_add with rsets_rm on buffer free (fix #22593)#22595
joelteply wants to merge 1 commit into
ggml-org:masterfrom
joelteply:metal-fix-rsets-leak-on-buffer-free

Conversation

@joelteply

@joelteply joelteply commented May 1, 2026

Copy link
Copy Markdown

Fixes #22593.

`ggml_metal_device_rsets_add` is called from `ggml_metal_buffer_rset_init` (line 1467) for every buffer when residency sets are active. The symmetric `ggml_metal_device_rsets_rm` API exists (line 911) but was defined and never called from anywhere in the tree — so the device's `rsets->data` array always retained references the per-buffer `rset_free` had just released. On `ggml_metal_device_free` → `ggml_metal_rsets_free` the assertion `[rsets->data count] == 0` fires deterministically.

Reproducible on macOS 15+ for any consumer that allocates buffers and tears down the device. Confirmed on Apple M5 Pro / macOS 15+ via a downstream consumer.

The fix is one line: pair the add with a remove in the buffer's `rset_free` path, before release. `buf->dev` is already populated by the time `rset_free` runs (`struct ggml_metal_buffer.dev`, line 1303) so no plumbing is needed. Build-tested locally on M5 Pro / macOS 15.

The residency-sets API was added in #17766; the remove-side wiring was missed in that PR. This commit closes that asymmetry.

Why not the env-var workaround

`GGML_METAL_NO_RESIDENCY=1` (line 775) bypasses the entire feature — but that throws away the keep-alive heartbeat #17766 was specifically optimizing for. On Metal-perf-critical consumers that's the wrong shape; this PR keeps the optimization and removes the assertion.

@joelteply joelteply requested a review from a team as a code owner May 1, 2026 22:19
@github-actions github-actions Bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels May 1, 2026
@ggml-gh-bot

ggml-gh-bot Bot commented May 1, 2026

Copy link
Copy Markdown

Hi @joelteply, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

  • AI-generated content: This project does not accept PRs, descriptions or commit messages that are fully or predominantly AI-generated. If you have used AI to assist you in writing code, please make sure to disclose that explicitly.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

ggml_metal_device_rsets_add was called from ggml_metal_buffer_rset_init
without a matching ggml_metal_device_rsets_rm on the buffer's free
path — so the device's rsets->data array always retained references
that ggml_metal_rsets_free then asserted on at device-free time
([rsets->data count] == 0). Fires deterministically on macOS 15+ for
any consumer that allocates buffers and then tears down the device.

Add the symmetric remove call in ggml_metal_buffer_rset_free, before
release. The rsets_rm API existed but was unused — this wires it up.
@joelteply joelteply force-pushed the metal-fix-rsets-leak-on-buffer-free branch from 5ee7cfc to ae2c4a5 Compare May 1, 2026 22:43
@joelteply joelteply changed the title metal : pair rsets_add with rsets_remove on buffer free (fix #22593) metal : pair rsets_add with rsets_rm on buffer free (fix #22593) May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metal: ggml_metal_rsets_free assertion fires deterministically on device free (missing rsets_remove in buffer rset free path)

1 participant