Replies: 2 comments 13 replies
-
@shishaochen Could you please take a look? Thanks! |
Beta Was this translation helpful? Give feedback.
1 reply
-
For horovod issue, the behavior should come from horovod itself but not from deepmd-kit. |
Beta Was this translation helpful? Give feedback.
12 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I hope you are doing well. Attached please find the timeline.json script for a training run on a single node with 16 workers and a local batch size of 32. TF_Intra_Op is set to 12 and TF_Inter_Op is set to 4, with OpenMP=3. I expected to see 4 compute threads, reflecting TF_Inter_Op setting.
The strange element of this run is on threads 4-9 with so much time spent on HorovodAllReduce. Is this behavior that you have noticed as well?
In addition, within the Json script there are functions for "enable_profiler" and "profiling" which give differing outputs with respect to the tensorboard trace. May I also ask what the difference between these two functionalities are?
Thank you very much for your time!
Warm Regards
Beta Was this translation helpful? Give feedback.
All reactions