Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoParallel] Visualize flow parallel timing diagram in static graph mode #58313

Merged
merged 60 commits into from
Nov 21, 2023

Conversation

AndSonder
Copy link
Contributor

@AndSonder AndSonder commented Oct 23, 2023

PR types

Others

PR changes

Others

Description

静态图模式下可视化流水并行时序图

静态图模式下自动并行的运行将调用C++端 StandaloneExecutor::Run ,该方法中将顺序执行提前拆分好的Job。本PR的主要目的是将不同设备上Job的运行时序图可视化出来并使用 Chrome::tracing 查看

如何使用?

以下以使用 test_pipeline_scheduler 单侧生成日志文件并生成可视化时序图为例进行说明:

由于单侧默认会清空掉生成的日志文件,我们需要先将清空日志的逻辑删除并指定log文件夹:

class TestFThenBPass(unittest.TestCase):
    def test_pp2(self):
        file_dir = os.path.dirname(os.path.abspath(__file__))
        launch_model_path = os.path.join(
            file_dir, "pipeline_scheduler_unittest.py"
        )

        if os.environ.get("WITH_COVERAGE", "OFF") == "ON":
            coverage_args = ["-m", "coverage", "run", "--branch", "-p"]
        else:
            coverage_args = []

        # tmp_dir = tempfile.TemporaryDirectory()
        cmd = (
            [sys.executable, "-u"]
            + coverage_args
            + [
                "-m",
                "paddle.distributed.launch",
                "--devices",
                "0,1",
                "--log_dir",
                "/home/root/Paddle/build/Testing/Temporary",
                launch_model_path,
            ]
        )

        process = subprocess.Popen(cmd)
        process.wait()
        self.assertEqual(process.returncode, 0)

        # tmp_dir.cleanup()

1、在开启FLAG的前提下,运行训练过程并生成log

FLAGS_auto_parallel_profiler=1 GLOG_v=0 ctest -R test_pipeline_scheduler $VV

GLOG_v=0 的目的是产生尽可能少的日志,降低正则匹配的时间

2、运行 profiler_helper_static.py 生成json文件

image

3、使用 Chrome Tracing 打开json文件

image

也可以使用 perfetto 打开 pipeline_profile_perfetto.json

image

相关PR:

@paddle-bot
Copy link

paddle-bot bot commented Oct 23, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Oct 23, 2023
@@ -38,6 +38,10 @@
#include "paddle/fluid/platform/device_event.h"
#include "paddle/phi/backends/device_manager.h"

#if defined(PADDLE_WITH_CUDA)
#include "paddle/phi/kernels/autotune/gpu_timer.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpu_timer只与类接口具体的实现方式相关,与类定义无关,应只在使用到的.cc文件中include,而不在基类头文件中include

@@ -103,6 +104,16 @@ ProgramInterpreter::~ProgramInterpreter() {
}

void ProgramInterpreter::RunImpl() {
#if defined(PADDLE_WITH_CUDA)
if (FLAGS_auto_parallel_profiler) {
// Note(sonder): Record the start time of the each stream.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE一般用于解释一些复杂、难以阅读的代码,或提示一些从代码中无法表达的信息。这几行代码非常简单直接,这个NOTE也只是把代码重复讲一遍,可以不需要。

#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
stream_timers_.clear();
std::vector<gpuStream_t> streams;
bool has_default_stream = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paddle框架不会使用空流,不需要处理空流的情况。

void Start() {
struct timeval time_now {};
gettimeofday(&time_now, nullptr);
start_time_ = (time_now.tv_sec * 1000) + (time_now.tv_usec / 1000.0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里可以加注释说明为何需要用CPU时间作为start_time

double start_time, end_time;
std::tie(start_time, end_time) =
interpretercores_[job_idx]->InterpreterRunTime();
VLOG(0) << "Profiler Info: Job (" << job_idx << "), type = " << job_type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里加注释说明这个log的作用,否则其它人不了解的情况下可能错误改动

@@ -0,0 +1,117 @@
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个脚本可以放在distributed/auto_parallel/static/下

@AndSonder AndSonder changed the title Visualize flow parallel timing diagram in static graph mode [feat] Visualize flow parallel timing diagram in static graph mode Oct 31, 2023
@AndSonder AndSonder changed the title [feat] Visualize flow parallel timing diagram in static graph mode [feat][AutoParallel] Visualize flow parallel timing diagram in static graph mode Oct 31, 2023
const std::vector<std::string>& feed_names, bool need_fetch = true) = 0;
const std::vector<std::string>& feed_names,
bool need_fetch = true,
bool enable_auto_parallel_profiler = false) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool enable_auto_parallel_profiler = false) = 0;
bool enable_job_schedule_profiler = false) = 0;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -34,6 +34,10 @@ PADDLE_DEFINE_EXPORTED_bool(new_executor_use_local_scope,
true,
"Use local_scope in new executor(especially used "
"in UT), can turn off for better performance");
PADDLE_DEFINE_EXPORTED_bool(auto_parallel_profiler,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么还需要这个FLAGS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除

enable_auto_parallel_profiler_ = enable_auto_parallel_profiler;

if (enable_auto_parallel_profiler_) {
#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

编译控制宏应该在条件判断外层,否则在宏条件不成立的情况下就会出现

if (enable_auto_parallel_profiler_) {
   空白
}

这种奇怪的代码

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

编译控制宏应该在条件判断外层,否则在宏条件不成立的情况下就会出现

if (enable_auto_parallel_profiler_) {
   空白
}

这种奇怪的代码

已修改


if (enable_auto_parallel_profiler_) {
#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
gpuStream_t calculated_stream =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是否有必要在每次run的时候都重复获取和设置相同的计算流?可否在CalculateStreamTimer构造时内部自动获取计算流,而不需要外部调用方设置?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是否有必要在每次run的时候都重复获取和设置相同的计算流?可否在CalculateStreamTimer构造时内部自动获取计算流,而不需要外部调用方设置?

已修改,改为了创建的时候传入place_,然后在内部创建好计算流

@@ -211,6 +219,12 @@ class ProgramInterpreter : public InterpreterBaseImpl {
InstructionSchedulingPriorityLess instruction_scheduling_priority_less;

std::vector<HookFunc> hookfuncs_;

#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
phi::CalculatedStreamTimer calculated_stream_timer_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
phi::CalculatedStreamTimer calculated_stream_timer_;
phi::CalculatedStreamTimer calculate_stream_timer_;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
phi::CalculatedStreamTimer calculated_stream_timer_;
#endif
size_t last_calculated_instr_id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
size_t last_calculated_instr_id;
size_t last_calculate_instr_id_;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -1040,6 +1063,15 @@ void ProgramInterpreter::RunInstruction(const Instruction& instr_node) {

try {
instr_node.WaitEvent(place_);
if (enable_auto_parallel_profiler_) {
#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
if (!interpreter::IsCommunicationOp(instr_node) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!calculated_stream_timer_.IsStarted()只是一个简单的标志判断,且对大多数算子的情况都是False,而!interpreter::IsCommunicationOp(instr_node)有许多代码判断逻辑。这种情况应该将!calculated_stream_timer_.IsStarted()作为&&语句第一个判断命题,从而借助C++短路机制减少!interpreter::IsCommunicationOp(instr_node)的实际调用次数,提升代码性能。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -114,6 +114,7 @@ def set_field_default_config(category, field, default_value):
set_field_default_config(PIPELINE, "accumulate_steps", 1)
set_field_default_config(PIPELINE, "generation_batch_size", 1)
set_field_default_config(PIPELINE, "enable_send_recv_overlap", False)
set_field_default_config(PIPELINE, "schedule_profiler", False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里看起来只能通过开关控制是否开启profiler,无法指定采样区间。是否可以支持直接设置pipeline.schedule_profiler_start和pipeline.schedule_profiler_end,默认[-1, -1)表示不开启,否则在[start, end)之间开启profiler,并在end-1个step之后退出整个任务的运行。

Copy link
Contributor Author

@AndSonder AndSonder Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里看起来只能通过开关控制是否开启profiler,无法指定采样区间。是否可以支持直接设置pipeline.schedule_profiler_start和pipeline.schedule_profiler_end,默认[-1, -1)表示不开启,否则在[start, end)之间开启profiler,并在end-1个step之后退出整个任务的运行。

现在已经有结合 Profiler_auto.nvprof_startProfiler_auto.nvprof_end 来控制采样区间的代码了,在PaddleNLP的 PR里面:

@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Nov 15, 2023
@PaddlePaddle PaddlePaddle unlocked this conversation Nov 15, 2023
Copy link
Contributor

@From00 From00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@From00 From00 merged commit 192a5f8 into PaddlePaddle:develop Nov 21, 2023
28 checks passed
SecretXV pushed a commit to SecretXV/Paddle that referenced this pull request Nov 28, 2023
… graph mode (PaddlePaddle#58313)

* merge from openvino master

* add InterpreterRunTime() to record interpreter's run time

* add profiler helper static to produce json file

* add color map and support perfetto format

* recover codes

* control include env for gpu_timer.h

* fix logic for profiler_helper_static.py

* fix build error

* fix build error

* recover thirdparty

* add flag control: not support new ir now

* set auto_parallel_profiler flag to false

* fix

* add auto_parallel_profiler as command parameter

* fix value name

* support gettimeofday for win env

* fix win build error

* fix win build error

* use job_type_to_id

* Fixed repeatedly timing the same stream

* add step line for timeline

* add step timeline and fix logic when job overlap

* update time record logic

* fix bug when start profile start from none zero step

* fix note

* remove FLAGS_auto_parallel_profiler

* use run config instead FLAGS_auto_parallelxx

* fix color map logic

* fix color map logic

* fix bug when log step does not start from 0

* fix

* fix

* don't use set_enable_auto_parallel_profiler

* fix bug

* disable auto_parallel_profiler when not open flag by command line

* fix bug

* remove resettime

* fix build bug

* fix

* remove set enable

* fix build error

* fix build error

* fix build error

* fix ci error

* fix

* fix run error

* fix

* fix

* fix calculate_stream_timer logic

* remove fluid head

* fix build error

* set default value for enable_job_schedule_profiler
@AndSonder AndSonder changed the title [feat][AutoParallel] Visualize flow parallel timing diagram in static graph mode [AutoParallel] Visualize flow parallel timing diagram in static graph mode Dec 7, 2023
@AndSonder AndSonder deleted the add_profiler branch April 23, 2024 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants