Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zgc/ditorch add process monitor tool #58

Merged
merged 2 commits into from
Oct 11, 2024

Conversation

zhaoguochun1995
Copy link
Collaborator

@zhaoguochun1995 zhaoguochun1995 commented Oct 10, 2024

在不侵入修改训练进程的情况下,周期性记录训练进程host侧和device侧的重要信息,如内存,芯片利用率等信息。
调试过程中一直有监控内存和设备使用率的需求,将使用过程中开发的工具提上来供以后使用

截屏2024-10-11 上午11 58 23
截屏2024-10-11 上午11 59 05
process_monitor_result_camb_pid129082_2024-10-11-11-54-21.csv
process_monitor_result_ascend_pid871524_2024-10-11-11-57-03.csv

@zhaoguochun1995 zhaoguochun1995 force-pushed the zgc/ditorch_add_process_minor_tool branch from 5b3e91a to e4c65d3 Compare October 11, 2024 03:55
@zhaoguochun1995 zhaoguochun1995 changed the title Zgc/ditorch add process minor tool Zgc/ditorch add process monitor tool Oct 11, 2024
@yangbofun yangbofun merged commit 70af878 into main Oct 11, 2024
13 checks passed
@yangbofun yangbofun deleted the zgc/ditorch_add_process_minor_tool branch October 11, 2024 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants