Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RETRO: RuntimeError: stack expects each tensor to be equal size, but got [2, 32] at entry 0 and [1, 32] at entry 29 #135

Open
mocarsha opened this issue Jul 21, 2022 · 2 comments
Assignees
Labels
question Further information is requested

Comments

@mocarsha
Copy link

mocarsha commented Jul 21, 2022

Hi,

Running the exact code on github for deepmind's retrieval transformer - RETRO, getting the following error:

RuntimeError: stack expects each tensor to be equal size, but got [2, 32] at entry 0 and [1, 32] at entry 29

Could you please help me with this? I used the same dataset as in the code.

@mocarsha mocarsha reopened this Jul 21, 2022
@vpj
Copy link
Member

vpj commented Aug 7, 2022

Can you please provide the full error?

@vpj vpj self-assigned this Aug 7, 2022
@vpj vpj added the question Further information is requested label Aug 7, 2022
@vpj vpj closed this as completed Aug 27, 2022
@Zahin112
Copy link

Zahin112 commented Jun 26, 2024

I am having same issue while running train.py. Here's the full detailed error:

Load data...[DONE] 2.39ms
Tokenize...[DONE] 29.36ms
Build vocabulary...[DONE] 0.62ms
Load BERT tokenizer...[DONE] 340.26ms
Load BERT model...[DONE] 882.21ms
Load index...[DONE] 69.50ms
2024-06-25 11:59:59.603955: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-25 11:59:59.604002: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-25 11:59:59.605299: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-25 11:59:59.611551: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-25 12:00:00.750669: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
No labml server url specified. Please start a labml server and specify the URL. Docs: https://github.com/labmlai/labml/tree/master/app

retro_small: 706e157632ea11ef989a0242ac1c000c
[clean]: "cleanup notebooks"
116: Train: 5% 88,760ms loss.train: 3.71168 88,760ms 0:00m/ 0:47m Traceback (most recent call last):
File "/content/annotated_deep_learning_paper_implementations/labml_nn/transformers/retro/train.py", line 225, in
train()
File "/content/annotated_deep_learning_paper_implementations/labml_nn/transformers/retro/train.py", line 213, in train
trainer()
File "/content/annotated_deep_learning_paper_implementations/labml_nn/transformers/retro/train.py", line 134, in call
for i, (src, tgt, neighbors) in monit.enum('Train', self.dataloader):
File "/usr/local/lib/python3.10/dist-packages/labml/internal/monitor/iterator.py", line 84, in next
next_value = next(self._iterator)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 631, in next
data = self._next_data()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 675, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/labml_nn/transformers/retro/dataset.py", line 131, in getitem
neighbors = torch.stack([torch.stack([self.tds.text_to_i(n) for n in chunks]) for chunks in s[2]])
RuntimeError: stack expects each tensor to be equal size, but got [2, 32] at entry 0 and [1, 32] at entry 31

Also, it says "No labml server url specified. Please start a labml server and specify the URL.". Do I need to create the server? is it required? can you explain please?

@vpj vpj reopened this Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants