Shift GPU Memory Computation to End of Benchmarking Script #30

achew010 · 2024-06-04T03:01:58Z

Description

This PR shifts all GPU memory computation from the end of each experiment to the end of the benchmarking script. This avoids the need to rerun experiments, instead the raw values are saved and the aggregated values are computed at the end across all the experiments in gather_report.

fabianlim · 2024-06-05T02:26:13Z

scripts/benchmarks/benchmark.py

+                gpu_logs = pd.read_csv(gpu_log_filename, skipinitialspace=True)
+                peak_nvidia_mem_by_device_id, device_name = get_peak_mem_usage_by_device_id(gpu_logs)
+                experiment_stats[tag].update({
+                    RESULT_FIELD_RESERVED_GPU_MEM: peak_nvidia_mem_by_device_id.mean(),


needs a comment what we are taking the mean over.

fabianlim · 2024-06-05T02:26:38Z

scripts/benchmarks/benchmark.py

            except FileNotFoundError:
                pass
+
+            if script_args['log_nvidia_smi'] is True:


dont need is True

fabianlim · 2024-06-05T02:26:48Z

scripts/benchmarks/benchmark.py

+                    RESULT_FIELD_DEVICE_NAME: device_name,
+                })
+
+            if script_args['log_memory_hf'] is True and tag in experiment_stats.keys():


fabianlim · 2024-06-05T02:26:58Z

scripts/benchmarks/benchmark.py

+                    k: v for k, v in experiment_stats[tag].items() 
+                    if any([prefix in k for prefix in memory_metrics_prefixes])
+                }
+                if len(memory_metrics.keys())>0:


pls lint the file with tox -e lint

shift gpu mem computation to gather_report

305873f

achew010 requested a review from fabianlim as a code owner June 4, 2024 03:01

fabianlim requested changes Jun 5, 2024

View reviewed changes

addressed comments

12a12c3

fabianlim approved these changes Jun 7, 2024

View reviewed changes

fabianlim merged commit bfde526 into foundation-model-stack:dev Jun 7, 2024

fabianlim mentioned this pull request Jun 7, 2024

Upstream Main: Fused Ops and Kernels, FSDP and Memory Fixes #35

Merged

achew010 deleted the shifted_gpu_mem_compute branch July 26, 2024 04:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shift GPU Memory Computation to End of Benchmarking Script #30

Shift GPU Memory Computation to End of Benchmarking Script #30

Uh oh!

achew010 commented Jun 4, 2024

Uh oh!

fabianlim Jun 5, 2024

Uh oh!

fabianlim Jun 5, 2024

Uh oh!

fabianlim Jun 5, 2024

Uh oh!

fabianlim Jun 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Shift GPU Memory Computation to End of Benchmarking Script #30

Shift GPU Memory Computation to End of Benchmarking Script #30

Uh oh!

Conversation

achew010 commented Jun 4, 2024

Description

Uh oh!

fabianlim Jun 5, 2024

Choose a reason for hiding this comment

Uh oh!

fabianlim Jun 5, 2024

Choose a reason for hiding this comment

Uh oh!

fabianlim Jun 5, 2024

Choose a reason for hiding this comment

Uh oh!

fabianlim Jun 5, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants