Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[torchbench] opacus_cifar10 memory not freed after each run. #6380

Open
ysiraichi opened this issue Jan 25, 2024 · 2 comments
Open

[torchbench] opacus_cifar10 memory not freed after each run. #6380

ysiraichi opened this issue Jan 25, 2024 · 2 comments
Labels

Comments

@ysiraichi
Copy link
Collaborator

🐛 Bug

Applying the following patch dumps the GPU peak memory usage of a given benchmark:

diff --git a/benchmarks/experiment_runner.py b/benchmarks/experiment_runner.py
index 9f55e3f02..a71cc20a2 100644
--- a/benchmarks/experiment_runner.py
+++ b/benchmarks/experiment_runner.py
@@ -202,6 +202,7 @@ class ExperimentRunner:
                        input_tensor):
     tracing_time = None
     total_time_start = time.perf_counter()
+    torch.cuda.reset_peak_memory_stats()
     # Invoke iteration function and measure tracing time w/o waiting on the
     # result.
     if benchmark_experiment.xla:
@@ -210,7 +211,8 @@ class ExperimentRunner:
         input_tensor, collect_full_output=self._args.collect_full_output)
     if benchmark_experiment.xla:
       tracing_time = time.perf_counter() - t_trace_start
-
+    print("> Max MEM (GB):", torch.cuda.max_memory_allocated() / 10**9)
     # Mark step.
     self._mark_step(benchmark_experiment)
     total_time = time.perf_counter() - total_time_start

Running opacus_cifar10 benchmark with the following command (see below) make the memory leak explicit. Basically, the amount of used memory keeps growing iteration after iteration.

# This should run 4 (2x2) training iterations.
python xla/benchmarks/experiment_runner.py \
    --suite-name torchbench --accelerator cuda --xla None --dynamo inductor --test train \
    --repeat 2 --iterations-per-run 2 \
    --no-resume --print-subprocess \
    -k opacus_cifar10
> Max MEM (GB): 3.061852672
> Max MEM (GB): 5.957279232
> Max MEM (GB): 8.821041152
> Max MEM (GB): 11.683754496

Expected behavior

After the first training iteration, memory usage stays roughly constant.

Environment

  • Reproducible on XLA backend [CPU/TPU]: CUDA
  • torch_xla version: 44660d8

cc @miladm @JackCaoG

@vanbasten23
Copy link
Collaborator

Does torch.cuda.reset_peak_memory_stats() work on torch_xla:gpu or it only works for native pytorch (inductor in this case)?

@ysiraichi
Copy link
Collaborator Author

I think it only works for inductor (I did try to use this with pt/xla, but didn't work). That's probably because their allocator is instrumented to gather this information (guess).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants