Skip to content

Disabling Unrelated Tests When Enabling CUDA Async Allocator in CI#65094

Merged
From00 merged 6 commits intoPaddlePaddle:developfrom
eee4017:lawu/disable_tests
Jul 4, 2024
Merged

Disabling Unrelated Tests When Enabling CUDA Async Allocator in CI#65094
From00 merged 6 commits intoPaddlePaddle:developfrom
eee4017:lawu/disable_tests

Conversation

@eee4017
Copy link
Contributor

@eee4017 eee4017 commented Jun 12, 2024

PR Category

Others

PR Types

Others

Description

When enabling the CUDA Async Allocator in the CI test, we disable the unrelated tests.

@paddle-bot
Copy link

paddle-bot bot commented Jun 12, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Jun 12, 2024
@jeng1220
Copy link
Collaborator

CI failed but it was NOT related to this PR

ERROR: test_simple_net_hybrid_strategy (__main__.TestSemiAutoParallelLlamaDataLoader)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/workspace/Paddle/test/collective/test_communication_api_base.py", line 79, in run_test_case
    self._launcher = subprocess.run(
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/bin/python', '-u', '-m', 'paddle.distributed.launch', '--log_dir', '/tmp/tmpkx_sv8w9', '--devices', '0,1,2,3,4,5,6,7', 'semi_auto_llama_dataloader.py']' returned non-zero exit status 1.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/workspace/Paddle/test/auto_parallel/hybrid_strategy/test_semi_auto_parallel_llama_model.py", line 224, in test_simple_net_hybrid_strategy
    self.run_test_case(
  File "/workspace/Paddle/test/collective/test_communication_api_base.py", line 90, in run_test_case
    raise RuntimeError(
RuntimeError: Error occurs when running this test case. The return code of command ['/usr/bin/python', '-u', '-m', 'paddle.distributed.launch', '--log_dir', '/tmp/tmpkx_sv8w9', '--devices', '0,1,2,3,4,5,6,7', 'semi_auto_llama_dataloader.py'] is 1
----------------------------------------------------------------------
Ran 9 tests in 444.641s
FAILED (errors=1)

@tianshuo78520a
Copy link
Collaborator

Ok, I'll take a look at the reason

@onecatcn onecatcn requested a review from risemeup1 June 14, 2024 02:16
@eee4017 eee4017 force-pushed the lawu/disable_tests branch from 492ab3c to 1ca684f Compare June 19, 2024 06:38
@paddle-ci-bot
Copy link

paddle-ci-bot bot commented Jun 27, 2024

Sorry to inform you that 1ca684f's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@eee4017 eee4017 force-pushed the lawu/disable_tests branch from 1ca684f to b9f9e08 Compare June 27, 2024 03:35
Comment on lines 267 to 268
if (FLAGS_use_cuda_managed_memory) {
PADDLE_ENFORCE_EQ(FLAGS_use_cuda_managed_memory,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的check逻辑和if判断条件是不是冲突的?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fixed

@eee4017 eee4017 requested a review from zyfncg July 2, 2024 08:09
@eee4017
Copy link
Contributor Author

eee4017 commented Jul 4, 2024

You must have one RD (From00, zhangbo9674) approval for file changes in paddle/fluid/framework/new_executor.

@onecatcn onecatcn requested a review from From00 July 4, 2024 02:37
Copy link
Contributor

@From00 From00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@From00 From00 merged commit a30c8a5 into PaddlePaddle:develop Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers NVIDIA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants