【Hackathon 7th No.21】为 Paddle 新增 reset_max_memory_reserved/reset_max_memory_allocated API -part #70032

Qin-sx · 2024-12-07T14:46:59Z

PR Category

User Experience

PR Types

New features

Description

https://github.com/PaddlePaddle/community/blob/master/hackathon/hackathon_7th/%E3%80%90Hackathon%207th%E3%80%91%E4%B8%AA%E4%BA%BA%E6%8C%91%E6%88%98%E8%B5%9B%E2%80%94%E6%A1%86%E6%9E%B6%E5%BC%80%E5%8F%91%E4%BB%BB%E5%8A%A1%E5%90%88%E9%9B%86.md#no21-%E4%B8%BA-paddle-%E6%96%B0%E5%A2%9E-reset_peak_memory_statsreset_max_memory_allocatedmemory_stats-api

modified: paddle/fluid/pybind/pybind.cc modified: paddle/phi/core/memory/stats.cc modified: paddle/phi/core/memory/stats.h modified: python/paddle/device/cuda/__init__.py

modified: paddle/fluid/pybind/pybind.cc modified: python/paddle/device/cuda/__init__.py

modified: paddle/fluid/pybind/pybind.cc modified: paddle/phi/core/memory/stats.cc modified: paddle/phi/core/memory/stats.h modified: test/cpp/fluid/memory/stats_test.cc

new file: test/legacy_test/test_cuda_memory_stats.py new file: test/legacy_test/test_cuda_reset_peak_memory_stats.py

paddle-bot · 2024-12-07T14:47:04Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

new file: test/legacy_test/test_cuda_reset_max_memory_allocated.py

modified: python/paddle/device/cuda/__init__.py modified: test/legacy_test/test_cuda_memory_stats.py modified: test/legacy_test/test_cuda_reset_max_memory_allocated.py modified: test/legacy_test/test_cuda_reset_peak_memory_stats.py

modified: paddle/fluid/pybind/pybind.cc modified: paddle/phi/core/memory/stats.cc modified: paddle/phi/core/memory/stats.h modified: test/cpp/fluid/memory/stats_test.cc

modified: python/paddle/device/cuda/__init__.py modified: test/legacy_test/test_cuda_reset_max_memory_allocated.py new file: test/legacy_test/test_cuda_reset_max_memory_reserved.py

From00 · 2024-12-14T13:54:33Z

python/paddle/device/cuda/__init__.py

+
+def reset_max_memory_allocated(device: _CudaPlaceLike | None = None) -> None:
+    '''
+    Reset the peak values of GPU memory allocated to the current values.


接口说明最好和max_memory_allocated保持一致。
Reset the peak size of GPU memory that is held by the allocator of the given device.

嗯，收到

From00 · 2024-12-14T14:00:49Z

test/cpp/fluid/memory/stats_test.cc

+
+      EXPECT_GE(peak_value_func_(stat_type_, 0),
+                current_value_func_(stat_type_, 0));
+      reset_peak_value_func_(stat_type_, 0);


在这个测试中，这一步如果reset不成功，是不是后面的检查也是会通过的？

如果reset不成功，下一行(107行)的检查应该无法通过。

void ResetPeakValueTest() { for (int64_t data : datas_) { update_func_(stat_type_, 0, data); EXPECT_GE(peak_value_func_(stat_type_, 0), current_value_func_(stat_type_, 0)); // reset_peak_value_func_(stat_type_, 0); printf("data: %ld, Peak Value: %ld, Current Value: %ld\n",data, peak_value_func_(stat_type_, 0), current_value_func_(stat_type_, 0)); EXPECT_EQ(peak_value_func_(stat_type_, 0), current_value_func_(stat_type_, 0)); } }

如果将reset_peak_value_func_函数注释，测试将无法通过。下面为部分运行结果。

115: Test timeout computed to be: 10000000 115: [==========] Running 4 tests from 1 test case. 115: [----------] Global test environment set-up. 115: [----------] 4 tests from StatsTest 115: [ RUN ] StatsTest.DeviceAllocatedTest 115: data: 543149808935355, Peak Value: 45703145873829393, Current Value: 45703145873829393 115: data: 634698327471328, Peak Value: 46337844201300721, Current Value: 46337844201300721 115: data: 706215795436611, Peak Value: 47044059996737332, Current Value: 47044059996737332 115: data: 577939367795333, Peak Value: 47621999364532665, Current Value: 47621999364532665 115: data: 419479490054362, Peak Value: 48041478854587027, Current Value: 48041478854587027 115: data: 21975227714595, Peak Value: 48063454082301622, Current Value: 48063454082301622 115: data: 812939817942250, Peak Value: 48876393900243872, Current Value: 48876393900243872 115: data: 984428837942082, Peak Value: 49860822738185954, Current Value: 49860822738185954 115: data: 537304104446806, Peak Value: 50398126842632760, Current Value: 50398126842632760 115: data: 685008544452453, Peak Value: 51083135387085213, Current Value: 51083135387085213 115: data: 563352858161268, Peak Value: 51646488245246481, Current Value: 51646488245246481 115: data: 690143831596330, Peak Value: 52336632076842811, Current Value: 52336632076842811 115: data: 964829938186077, Peak Value: 53301462015028888, Current Value: 53301462015028888 115: data: 476984078018245, Peak Value: 53778446093047133, Current Value: 53778446093047133 115: data: 804403365180177, Peak Value: 54582849458227310, Current Value: 54582849458227310 115: data: -57918691189304, Peak Value: 54582849458227310, Current Value: 54524930767038006 115: /home/aistudio/test/Paddle/test/cpp/fluid/memory/stats_test.cc:109: Failure 115: Expected equality of these values: 115: peak_value_func_(stat_type_, 0) 115: Which is: 54582849458227310 115: current_value_func_(stat_type_, 0) 115: Which is: 54524930767038006 115: data: 947611269236893, Peak Value: 55472542036274899, Current Value: 55472542036274899 115: data: 752188963801927, Peak Value: 56224731000076826, Current Value: 56224731000076826 115: data: 710946451346683, Peak Value: 56935677451423509, Current Value: 56935677451423509 115: data: -49226452527666, Peak Value: 56935677451423509, Current Value: 56886450998895843 115: /home/aistudio/test/Paddle/test/cpp/fluid/memory/stats_test.cc:109: Failure 115: Expected equality of these values: 115: peak_value_func_(stat_type_, 0) 115: Which is: 56935677451423509 115: current_value_func_(stat_type_, 0) 115: Which is: 56886450998895843 115: data: -59049377393968, Peak Value: 56935677451423509, Current Value: 56827401621501875 115: /home/aistudio/test/Paddle/test/cpp/fluid/memory/stats_test.cc:109: Failure 115: Expected equality of these values: 115: peak_value_func_(stat_type_, 0) 115: Which is: 56935677451423509 115: current_value_func_(stat_type_, 0) 115: Which is: 56827401621501875 115: data: 14128239868858, Peak Value: 56935677451423509, Current Value: 56841529861370733 115: /home/aistudio/test/Paddle/test/cpp/fluid/memory/stats_test.cc:109: Failure 115: Expected equality of these values: 115: peak_value_func_(stat_type_, 0) 115: Which is: 56935677451423509 115: current_value_func_(stat_type_, 0) 115: Which is: 56841529861370733 115: data: 463298869064035, Peak Value: 57304828730434768, Current Value: 57304828730434768

modified: paddle/fluid/pybind/pybind.cc modified: python/paddle/device/cuda/__init__.py deleted: test/legacy_test/test_cuda_memory_stats.py deleted: test/legacy_test/test_cuda_reset_peak_memory_stats.py

From00

LGTM

luotao1 · 2024-12-16T06:34:58Z

覆盖率不够，需要加单测

SigureMo

LGTMeow for new pybind API without type annotations

Qin-sx · 2024-12-16T09:09:09Z

覆盖率不够，需要加单测

未覆盖的代码为Paddle/python/paddle/device/cuda/__init__.py文件中的326和356行。当Paddle由CPU编译时，这两行会运行。

在测试文件Paddle/test/legacy_test/test_cuda_reset_max_memory_allocated.py和backup/Paddle/test/legacy_test/test_cuda_reset_max_memory_reserved.py中的85行有对此情况的测试。

    def test_reset_max_memory_allocated_exception(self):
        if core.is_compiled_with_cuda():
            wrong_device = [
                core.CPUPlace(),
                device_count() + 1,
                -2,
                0.5,
                "gpu1",
            ]
            for device in wrong_device:
                with self.assertRaises(BaseException):  # noqa: B017
                    reset_max_memory_allocated(device)
        else:
            with self.assertRaises(ValueError):
                reset_max_memory_allocated()

    def test_reset_max_memory_reserved_exception(self):
        if core.is_compiled_with_cuda():
            wrong_device = [
                core.CPUPlace(),
                device_count() + 1,
                -2,
                0.5,
                "gpu1",
            ]
            for device in wrong_device:
                with self.assertRaises(BaseException):  # noqa: B017
                    reset_max_memory_reserved(device)
        else:
            with self.assertRaises(ValueError):
                reset_max_memory_reserved()

当由CPU编译的Paddle执行测试时，上述两行代码会被覆盖到。

jeff41404 · 2024-12-16T09:48:26Z

please add link of rfc and PR of chinese document in description above

luotao1 · 2024-12-16T09:54:31Z

please add link of rfc and PR of chinese document in description above

@jeff41404 Done

jeff41404

LGTM

sunzhongkai588

LGTM for docs

luotao1 · 2024-12-17T08:52:17Z

hi, @Qin-sx

非常感谢你对飞桨的贡献，我们正在运营一个PFCC组织，会通过定期分享技术知识与发布开发者主导任务的形式持续为飞桨做贡献，详情可见 https://github.com/luotao1 主页说明。
如果你对PFCC有兴趣，请发送邮件至 [email protected]，我们会邀请你加入~

Qin-sx added 4 commits December 5, 2024 06:36

added reset peak value initialization

e2299f9

modified: paddle/fluid/pybind/pybind.cc modified: paddle/phi/core/memory/stats.cc modified: paddle/phi/core/memory/stats.h modified: python/paddle/device/cuda/__init__.py

added comments

dc30348

modified: paddle/fluid/pybind/pybind.cc modified: python/paddle/device/cuda/__init__.py

added cpp tests

d081dde

modified: paddle/fluid/pybind/pybind.cc modified: paddle/phi/core/memory/stats.cc modified: paddle/phi/core/memory/stats.h modified: test/cpp/fluid/memory/stats_test.cc

added python tests

35a12c8

new file: test/legacy_test/test_cuda_memory_stats.py new file: test/legacy_test/test_cuda_reset_peak_memory_stats.py

paddle-bot bot added the contributor External developers label Dec 7, 2024

added a python test for reset_max_memory_allocated

cb5036c

new file: test/legacy_test/test_cuda_reset_max_memory_allocated.py

luotao1 mentioned this pull request Dec 9, 2024

【Hackathon 7th】开源贡献个人挑战赛 #68244

Closed

luotao1 assigned luotao1 and From00 Dec 9, 2024

luotao1 added PaddlePaddle Hackathon API labels Dec 9, 2024

Qin-sx added 3 commits December 9, 2024 21:48

formatted by pre-commit

5d28856

modified: python/paddle/device/cuda/__init__.py modified: test/legacy_test/test_cuda_memory_stats.py modified: test/legacy_test/test_cuda_reset_max_memory_allocated.py modified: test/legacy_test/test_cuda_reset_peak_memory_stats.py

formatted by pre-commit (clang-format)

f6b3d84

modified: paddle/fluid/pybind/pybind.cc modified: paddle/phi/core/memory/stats.cc modified: paddle/phi/core/memory/stats.h modified: test/cpp/fluid/memory/stats_test.cc

added reset max memory reserved function

b6ea9be

modified: python/paddle/device/cuda/__init__.py modified: test/legacy_test/test_cuda_reset_max_memory_allocated.py new file: test/legacy_test/test_cuda_reset_max_memory_reserved.py

From00 reviewed Dec 14, 2024

View reviewed changes

deleted memory stats and reset peak memory stats

caad368

modified: paddle/fluid/pybind/pybind.cc modified: python/paddle/device/cuda/__init__.py deleted: test/legacy_test/test_cuda_memory_stats.py deleted: test/legacy_test/test_cuda_reset_peak_memory_stats.py

From00 approved these changes Dec 16, 2024

View reviewed changes

luotao1 changed the title ~~【Hackathon 7th No.21】为 Paddle 新增 reset_peak_memory_stats/reset_max_memory_allocated/memory_stats API~~ 【Hackathon 7th No.21】为 Paddle 新增 reset_max_memory_reserved/reset_max_memory_allocated API Dec 16, 2024

luotao1 assigned jeff41404, SigureMo and sunzhongkai588 Dec 16, 2024

SigureMo approved these changes Dec 16, 2024

View reviewed changes

jeff41404 approved these changes Dec 16, 2024

View reviewed changes

sunzhongkai588 approved these changes Dec 17, 2024

View reviewed changes

luotao1 merged commit 202aff3 into PaddlePaddle:develop Dec 17, 2024
28 of 29 checks passed

luotao1 changed the title ~~【Hackathon 7th No.21】为 Paddle 新增 reset_max_memory_reserved/reset_max_memory_allocated API~~ 【Hackathon 7th No.21】为 Paddle 新增 reset_max_memory_reserved/reset_max_memory_allocated API -part Dec 17, 2024

Qin-sx mentioned this pull request Jan 2, 2025

飞桨开源社区重磅福利！开源贡献者专享2000元会员礼 PaddlePaddle/community#1015

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【Hackathon 7th No.21】为 Paddle 新增 reset_max_memory_reserved/reset_max_memory_allocated API -part #70032

【Hackathon 7th No.21】为 Paddle 新增 reset_max_memory_reserved/reset_max_memory_allocated API -part #70032

Qin-sx commented Dec 7, 2024 •

edited by luotao1

Loading

paddle-bot bot commented Dec 7, 2024

From00 Dec 14, 2024

Qin-sx Dec 15, 2024

From00 Dec 14, 2024

Qin-sx Dec 15, 2024

From00 left a comment

luotao1 commented Dec 16, 2024

SigureMo left a comment

Qin-sx commented Dec 16, 2024

jeff41404 commented Dec 16, 2024

luotao1 commented Dec 16, 2024

jeff41404 left a comment

sunzhongkai588 left a comment

luotao1 commented Dec 17, 2024

【Hackathon 7th No.21】为 Paddle 新增 reset_max_memory_reserved/reset_max_memory_allocated API -part #70032

【Hackathon 7th No.21】为 Paddle 新增 reset_max_memory_reserved/reset_max_memory_allocated API -part #70032

Conversation

Qin-sx commented Dec 7, 2024 • edited by luotao1 Loading

PR Category

PR Types

Description

paddle-bot bot commented Dec 7, 2024

From00 Dec 14, 2024

Choose a reason for hiding this comment

Qin-sx Dec 15, 2024

Choose a reason for hiding this comment

From00 Dec 14, 2024

Choose a reason for hiding this comment

Qin-sx Dec 15, 2024

Choose a reason for hiding this comment

From00 left a comment

Choose a reason for hiding this comment

luotao1 commented Dec 16, 2024

SigureMo left a comment

Choose a reason for hiding this comment

Qin-sx commented Dec 16, 2024

jeff41404 commented Dec 16, 2024

luotao1 commented Dec 16, 2024

jeff41404 left a comment

Choose a reason for hiding this comment

sunzhongkai588 left a comment

Choose a reason for hiding this comment

luotao1 commented Dec 17, 2024

Qin-sx commented Dec 7, 2024 •

edited by luotao1

Loading