Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training without Sequence Parallelism but VIDEO_SYNC_GROUP #162

Open
rob-hen opened this issue Nov 5, 2024 · 6 comments
Open

Training without Sequence Parallelism but VIDEO_SYNC_GROUP #162

rob-hen opened this issue Nov 5, 2024 · 6 comments

Comments

@rob-hen
Copy link

rob-hen commented Nov 5, 2024

Hi all,

the provided script train_pyramid_flow.sh does not set the flag use_sequence_parallel. In that case, what is the purpose of using VIDEO_SYNC_GROUP=8? Why we want all workers to use the same video?

@rob-hen rob-hen closed this as completed Nov 5, 2024
@rob-hen rob-hen reopened this Nov 5, 2024
@jy0205
Copy link
Owner

jy0205 commented Nov 5, 2024

Hi, we do not use the sequence parallel during training. The VIDEO_SYNC_GROUP controls the number of processes that accept the same video batch as input. We find such a trick will make the gradient direction more stable (optimize the performance of the whole latent sequence of a video, not just a latent from different videos).

@rob-hen
Copy link
Author

rob-hen commented Nov 5, 2024

Hi @jy0205,

thank you for the answer.
So with VIDEO_SYNC_GROUP =8 and GPUS=8, all GPUs get exactly the same videos. However, I don't see the difference between the processes, all will use exactly the same latent (the same clip from the videos):

'video': video_latent,
.

@yjhong89
Copy link

yjhong89 commented Nov 6, 2024

I think video_sync_group doesn't split same video latent, but accept same video latent without splitting.

  • This part is different to sequece parallel, which split latent according to time axis.
  • Is that right ??

@yjhong89
Copy link

yjhong89 commented Nov 6, 2024

@jy0205
Copy link
Owner

jy0205 commented Nov 6, 2024

I think video_sync_group doesn't split same video latent, but accept same video latent without splitting.

  • This part is different to sequece parallel, which split latent according to time axis.
  • Is that right ??

Yes, you are right. The video_sync_group does not split the video. It works since different video ranks load different video lengths. You can find in the sample_length method.

@jy0205
Copy link
Owner

jy0205 commented Nov 6, 2024

All the stages employ the uniform sampling. We will make the video token sequence length-balanced (let the token length sum to be fixed)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants