Skip to content

Commit

Permalink
Fix VBE+SSD path with StagedPipeline (#2607)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: #2607

when SSD offloading is configured and VBD is used, as in Jupiter XL case:

we will call into
```
if sparse_features.variable_stride_per_key() and len(embeddings) > 1:
            embeddings = self._merge_variable_batch_embeddings(embeddings, vbe_splits)
```

and throw the error:

```
File "/mnt/xarfuse/uid-179947/2baf53ce-seed-nspid4026531836_cgpid5102251-ns-4026531841/torchrec/distributed/embedding_lookup.py", line 530, in <listcomp>
    split_embs = [e.split(s) for e, s in zip(embeddings, splits)]
  File "/mnt/xarfuse/uid-179947/2baf53ce-seed-nspid4026531836_cgpid5102251-ns-4026531841/torch/_tensor.py", line 1028, in split
    return torch._VF.split_with_sizes(self, split_size, dim)
RuntimeError: split_with_sizes expects split_sizes to sum exactly to 16 (input tensor's size at dimension 0), but got split_sizes=[4096]
```

need to make sure we pass batch_size_per_feature_per_rank for SSDTableBatchedEmbeddingBags case as well, otherwise the embedding output shape won't be in 1-D as expected

Reviewed By: dstaay-fb, sryap

Differential Revision: D66647146

fbshipit-source-id: 537390988ca737617760a5a040e94b762174cf2f
  • Loading branch information
chrisxcai authored and facebook-github-bot committed Dec 4, 2024
1 parent 40a3727 commit 7819471
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions torchrec/distributed/batched_embedding_kernel.py
Original file line number Diff line number Diff line change
Expand Up @@ -1166,6 +1166,7 @@ def forward(self, features: KeyedJaggedTensor) -> torch.Tensor:
(
SplitTableBatchedEmbeddingBagsCodegen,
DenseTableBatchedEmbeddingBagsCodegen,
SSDTableBatchedEmbeddingBags,
),
):
return self.emb_module(
Expand Down

0 comments on commit 7819471

Please sign in to comment.