Skip to content

Conversation

@ruisizhang123
Copy link
Contributor

@ruisizhang123 ruisizhang123 commented Nov 24, 2025

Validate DSV3 manual bucketing when EP/TP are enable. (DSV3-16B)

(Single Node: BS = 1)

Node Method Parallelism Memory TPS Trace
1-Node (8H100) SimpleFSDP (aot_eager) FSDP=4 EP=2 Link
1-Node (8H100) FSDP2-eager FSDP=4 EP=2 55.19GiB(58.10%) 8,003 Link
1-Node (8H100) SimpleFSDP (aot_eager) FSDP=2 TP=2 EP=2 Link
1-Node (8H100) FSDP2-eager FSDP=2 TP=2 EP=2 Link
8-Node (64H100) SimpleFSDP (aot_eager) FSDP=4 EP=2 Link
8-Node (64H100) FSDP2-eager FSDP=4 EP=2 Link
8-Node (64H100) SimpleFSDP (aot_eager) FSDP=2 TP=2 EP=2 Link
9-Node (64H100) FSDP2-eager FSDP=2 TP=2 EP=2 Link
  1. Loss Equivalence

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 24, 2025
@ruisizhang123 ruisizhang123 marked this pull request as draft November 24, 2025 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants