Skip to content

Conversation

@Difers
Copy link
Contributor

@Difers Difers commented Oct 13, 2025

Before submitting

  • Lint code. If there are lint issues, please format the code first.
# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py
  • Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

Bug fixes

PR changes

APIs

Description

开启tensorwise_offload_optimizer与uc,目前在save 时会触发offload,但没有reload开关,导致save optimizer做通信时,tensor在pinned_mermory上,send recv失败,报nccl error

@paddle-bot
Copy link

paddle-bot bot commented Oct 13, 2025

Thanks for your contribution!

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@2007f73). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #2722   +/-   ##
==========================================
  Coverage           ?   29.45%           
==========================================
  Files              ?      312           
  Lines              ?    54625           
  Branches           ?        0           
==========================================
  Hits               ?    16088           
  Misses             ?    38537           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tensor = (
state_dict[key] if padding_start >= padding_end else state_dict[key][: padding_start - begin]
)
).cuda()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

得考虑到reload导致显存剧增,导致oom

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants