Add callback for profiling GPU memory usage by GMNGeoffrey · Pull Request #249 · aqlaboratory/openfold-3

GMNGeoffrey · 2026-06-09T01:34:00Z

Summary
It's useful to be able to keep track of where GPU memory is being used in the model and where the peaks are so we can try to fit larger inputs (or on smaller GPUs).

Changes

Adds an optional callback that tracks GPU memory usage with torch.cuda.memory utilities. This outputs a pkl dump of all allocations over time as well as logging the peak memory usage.

Testing

Basic unit test matching the ones for other experiment runner config settings.

Other Notes
My editor was stripping trailing whitespace from the files I touched and introducing distracting diffs. Rather than fight with it, I pre-factored that change, so you can just view the last two commits for the substantive change (or use the view that hides whitespace diffs). I can revert that if you don't like formatting changes of untouched lines mixed into PRs though.

My editor is just cleaning these up on save and I think we do want it cleaned up, but trying to avoid clutter in PR review.

Opt-in via --record-memory-snapshot or `experiment_settings.record_memory_snapshot`. When enabled, logs peak GPU memory and dumps a torch.cuda.memory._record_memory_history snapshot for each predict batch to <output_dir>/<query_id>/seed_<n>/mem_snapshot.pkl. Registered after PredictTimer so the snapshot dump runs outside the timer's measurement window.

Basic test mirroring the existing test_use_msa_cli / test_use_templates_cli pattern.

GMNGeoffrey · 2026-06-09T01:37:16Z

@christinaflo Sorry, I think I misremembered and thought your comment in #227 said I should share my version, but you actually said you already had one 😬 If this isn't helpful or you like yours better, we can just close this 😄 or if you want to share the version you've got, I'm happy to see if there's any meaningful diff between them and clean up or test

christinaflo · 2026-06-09T04:21:36Z

Yeah i just wanted to avoid the conflicts for when mine eventually gets merged in, but i can share mine and you can add some edits on top of it? mine has some extra functionality id like to keep

jnwei

This is a nice Callback to have, especially for profiling!

One nit: In general, we try to limit the available configuration settings available from the command line to only those which we expect the user to need frequently (e.g. templates, colabfold settings).

Since most users will probably not use memory profiling with their inference run, can we keep the settings configurable only through the experiment_settings header in the runner_yaml?

jandom · 2026-06-09T11:45:58Z

Very handy!

christinaflo · 2026-06-09T17:26:42Z

@GMNGeoffrey Ill tag you in the PR later today, mine was mostly training focused so I think we can add in the additions on top of the general snapshot that you have in this PR

GMNGeoffrey added 3 commits June 8, 2026 18:23

Strip trailing whitespace in files touched by next commit

6baacbe

My editor is just cleaning these up on save and I think we do want it cleaned up, but trying to avoid clutter in PR review.

Add test for record_memory_snapshot CLI flag

54d4184

Basic test mirroring the existing test_use_msa_cli / test_use_templates_cli pattern.

jnwei reviewed Jun 9, 2026

View reviewed changes

jandom added the safe-to-test Internal only label used to indicate PRs that are ready for automated CI testing. label Jun 9, 2026

christinaflo closed this Jun 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add callback for profiling GPU memory usage#249

Add callback for profiling GPU memory usage#249
GMNGeoffrey wants to merge 3 commits into
aqlaboratory:mainfrom
GMNGeoffrey:memsnap-callback

GMNGeoffrey commented Jun 9, 2026

Uh oh!

GMNGeoffrey commented Jun 9, 2026

Uh oh!

christinaflo commented Jun 9, 2026

Uh oh!

jnwei left a comment

Uh oh!

jandom commented Jun 9, 2026

Uh oh!

christinaflo commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

GMNGeoffrey commented Jun 9, 2026

Uh oh!

GMNGeoffrey commented Jun 9, 2026

Uh oh!

christinaflo commented Jun 9, 2026

Uh oh!

jnwei left a comment

Choose a reason for hiding this comment

Uh oh!

jandom commented Jun 9, 2026

Uh oh!

christinaflo commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants