Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon 7th No.21】为 Paddle 新增 reset_max_memory_reserved/reset_max_memory_allocated API -part #70032

Merged
merged 9 commits into from
Dec 17, 2024

Conversation

	modified:   paddle/fluid/pybind/pybind.cc
	modified:   paddle/phi/core/memory/stats.cc
	modified:   paddle/phi/core/memory/stats.h
	modified:   python/paddle/device/cuda/__init__.py
	modified:   paddle/fluid/pybind/pybind.cc
	modified:   python/paddle/device/cuda/__init__.py
	modified:   paddle/fluid/pybind/pybind.cc
	modified:   paddle/phi/core/memory/stats.cc
	modified:   paddle/phi/core/memory/stats.h
	modified:   test/cpp/fluid/memory/stats_test.cc
	new file:   test/legacy_test/test_cuda_memory_stats.py
	new file:   test/legacy_test/test_cuda_reset_peak_memory_stats.py
Copy link

paddle-bot bot commented Dec 7, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Dec 7, 2024
	new file:   test/legacy_test/test_cuda_reset_max_memory_allocated.py
	modified:   python/paddle/device/cuda/__init__.py
	modified:   test/legacy_test/test_cuda_memory_stats.py
	modified:   test/legacy_test/test_cuda_reset_max_memory_allocated.py
	modified:   test/legacy_test/test_cuda_reset_peak_memory_stats.py
	modified:   paddle/fluid/pybind/pybind.cc
	modified:   paddle/phi/core/memory/stats.cc
	modified:   paddle/phi/core/memory/stats.h
	modified:   test/cpp/fluid/memory/stats_test.cc
	modified:   python/paddle/device/cuda/__init__.py
	modified:   test/legacy_test/test_cuda_reset_max_memory_allocated.py
	new file:   test/legacy_test/test_cuda_reset_max_memory_reserved.py

def reset_max_memory_allocated(device: _CudaPlaceLike | None = None) -> None:
'''
Reset the peak values of GPU memory allocated to the current values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

接口说明最好和max_memory_allocated保持一致。
Reset the peak size of GPU memory that is held by the allocator of the given device.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯,收到


EXPECT_GE(peak_value_func_(stat_type_, 0),
current_value_func_(stat_type_, 0));
reset_peak_value_func_(stat_type_, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在这个测试中,这一步如果reset不成功,是不是后面的检查也是会通过的?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果reset不成功,下一行(107行)的检查应该无法通过。

  void ResetPeakValueTest() {
    for (int64_t data : datas_) {
      update_func_(stat_type_, 0, data);

      EXPECT_GE(peak_value_func_(stat_type_, 0),
                current_value_func_(stat_type_, 0));
      // reset_peak_value_func_(stat_type_, 0);
      printf("data: %ld, Peak Value: %ld, Current Value: %ld\n",data, peak_value_func_(stat_type_, 0), current_value_func_(stat_type_, 0));
      EXPECT_EQ(peak_value_func_(stat_type_, 0),
                current_value_func_(stat_type_, 0));
    }
  }

如果将reset_peak_value_func_函数注释,测试将无法通过。下面为部分运行结果。

115: Test timeout computed to be: 10000000
115: [==========] Running 4 tests from 1 test case.
115: [----------] Global test environment set-up.
115: [----------] 4 tests from StatsTest
115: [ RUN      ] StatsTest.DeviceAllocatedTest
115: data: 543149808935355, Peak Value: 45703145873829393, Current Value: 45703145873829393
115: data: 634698327471328, Peak Value: 46337844201300721, Current Value: 46337844201300721
115: data: 706215795436611, Peak Value: 47044059996737332, Current Value: 47044059996737332
115: data: 577939367795333, Peak Value: 47621999364532665, Current Value: 47621999364532665
115: data: 419479490054362, Peak Value: 48041478854587027, Current Value: 48041478854587027
115: data: 21975227714595, Peak Value: 48063454082301622, Current Value: 48063454082301622
115: data: 812939817942250, Peak Value: 48876393900243872, Current Value: 48876393900243872
115: data: 984428837942082, Peak Value: 49860822738185954, Current Value: 49860822738185954
115: data: 537304104446806, Peak Value: 50398126842632760, Current Value: 50398126842632760
115: data: 685008544452453, Peak Value: 51083135387085213, Current Value: 51083135387085213
115: data: 563352858161268, Peak Value: 51646488245246481, Current Value: 51646488245246481
115: data: 690143831596330, Peak Value: 52336632076842811, Current Value: 52336632076842811
115: data: 964829938186077, Peak Value: 53301462015028888, Current Value: 53301462015028888
115: data: 476984078018245, Peak Value: 53778446093047133, Current Value: 53778446093047133
115: data: 804403365180177, Peak Value: 54582849458227310, Current Value: 54582849458227310
115: data: -57918691189304, Peak Value: 54582849458227310, Current Value: 54524930767038006
115: /home/aistudio/test/Paddle/test/cpp/fluid/memory/stats_test.cc:109: Failure
115: Expected equality of these values:
115:   peak_value_func_(stat_type_, 0)
115:     Which is: 54582849458227310
115:   current_value_func_(stat_type_, 0)
115:     Which is: 54524930767038006
115: data: 947611269236893, Peak Value: 55472542036274899, Current Value: 55472542036274899
115: data: 752188963801927, Peak Value: 56224731000076826, Current Value: 56224731000076826
115: data: 710946451346683, Peak Value: 56935677451423509, Current Value: 56935677451423509
115: data: -49226452527666, Peak Value: 56935677451423509, Current Value: 56886450998895843
115: /home/aistudio/test/Paddle/test/cpp/fluid/memory/stats_test.cc:109: Failure
115: Expected equality of these values:
115:   peak_value_func_(stat_type_, 0)
115:     Which is: 56935677451423509
115:   current_value_func_(stat_type_, 0)
115:     Which is: 56886450998895843
115: data: -59049377393968, Peak Value: 56935677451423509, Current Value: 56827401621501875
115: /home/aistudio/test/Paddle/test/cpp/fluid/memory/stats_test.cc:109: Failure
115: Expected equality of these values:
115:   peak_value_func_(stat_type_, 0)
115:     Which is: 56935677451423509
115:   current_value_func_(stat_type_, 0)
115:     Which is: 56827401621501875
115: data: 14128239868858, Peak Value: 56935677451423509, Current Value: 56841529861370733
115: /home/aistudio/test/Paddle/test/cpp/fluid/memory/stats_test.cc:109: Failure
115: Expected equality of these values:
115:   peak_value_func_(stat_type_, 0)
115:     Which is: 56935677451423509
115:   current_value_func_(stat_type_, 0)
115:     Which is: 56841529861370733
115: data: 463298869064035, Peak Value: 57304828730434768, Current Value: 57304828730434768

	modified:   paddle/fluid/pybind/pybind.cc
	modified:   python/paddle/device/cuda/__init__.py
	deleted:    test/legacy_test/test_cuda_memory_stats.py
	deleted:    test/legacy_test/test_cuda_reset_peak_memory_stats.py
Copy link
Contributor

@From00 From00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luotao1 luotao1 changed the title 【Hackathon 7th No.21】为 Paddle 新增 reset_peak_memory_stats/reset_max_memory_allocated/memory_stats API 【Hackathon 7th No.21】为 Paddle 新增 reset_max_memory_reserved/reset_max_memory_allocated API Dec 16, 2024
@luotao1
Copy link
Contributor

luotao1 commented Dec 16, 2024

image 覆盖率不够,需要加单测

Copy link
Member

@SigureMo SigureMo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTMeow 🐾 for new pybind API without type annotations

@Qin-sx
Copy link
Contributor Author

Qin-sx commented Dec 16, 2024

image 覆盖率不够,需要加单测

未覆盖的代码为Paddle/python/paddle/device/cuda/__init__.py文件中的326和356行。当Paddle由CPU编译时,这两行会运行。
Lcov

在测试文件Paddle/test/legacy_test/test_cuda_reset_max_memory_allocated.pybackup/Paddle/test/legacy_test/test_cuda_reset_max_memory_reserved.py中的85行有对此情况的测试。

    def test_reset_max_memory_allocated_exception(self):
        if core.is_compiled_with_cuda():
            wrong_device = [
                core.CPUPlace(),
                device_count() + 1,
                -2,
                0.5,
                "gpu1",
            ]
            for device in wrong_device:
                with self.assertRaises(BaseException):  # noqa: B017
                    reset_max_memory_allocated(device)
        else:
            with self.assertRaises(ValueError):
                reset_max_memory_allocated()

    def test_reset_max_memory_reserved_exception(self):
        if core.is_compiled_with_cuda():
            wrong_device = [
                core.CPUPlace(),
                device_count() + 1,
                -2,
                0.5,
                "gpu1",
            ]
            for device in wrong_device:
                with self.assertRaises(BaseException):  # noqa: B017
                    reset_max_memory_reserved(device)
        else:
            with self.assertRaises(ValueError):
                reset_max_memory_reserved()

当由CPU编译的Paddle执行测试时,上述两行代码会被覆盖到。

@jeff41404
Copy link
Contributor

please add link of rfc and PR of chinese document in description above

@luotao1
Copy link
Contributor

luotao1 commented Dec 16, 2024

please add link of rfc and PR of chinese document in description above

@jeff41404 Done

Copy link
Contributor

@jeff41404 jeff41404 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@sunzhongkai588 sunzhongkai588 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for docs

@luotao1 luotao1 merged commit 202aff3 into PaddlePaddle:develop Dec 17, 2024
28 of 29 checks passed
@luotao1 luotao1 changed the title 【Hackathon 7th No.21】为 Paddle 新增 reset_max_memory_reserved/reset_max_memory_allocated API 【Hackathon 7th No.21】为 Paddle 新增 reset_max_memory_reserved/reset_max_memory_allocated API -part Dec 17, 2024
@luotao1
Copy link
Contributor

luotao1 commented Dec 17, 2024

hi, @Qin-sx

  • 非常感谢你对飞桨的贡献,我们正在运营一个PFCC组织,会通过定期分享技术知识与发布开发者主导任务的形式持续为飞桨做贡献,详情可见 https://github.com/luotao1 主页说明。
  • 如果你对PFCC有兴趣,请发送邮件至 [email protected],我们会邀请你加入~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants