You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Even though you are launching block_bucketize_sparse_features and embedding lookup kernel on different streams, they have a dependency on each other (i.e. block_bucketize_sparse_features needs to happen for KJT all2all, only after which embedding lookup can happen) so they cannot be scheduled in parallel. They would also be sharing the same compute resources on the GPU
Closing for now, feel free to reopen if this hasn't addressed your question
The text was updated successfully, but these errors were encountered: