You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "/xxx/Megatron-LLaMA/pretrain_llama.py", line 119, in
pretrain(train_valid_test_datasets_provider, model_provider,
File "/xxx/Megatron-LLaMA/megatron/training.py", line 153, in pretrain
iteration = train(forward_step_func,
File "/xxx/Megatron-LLaMA/megatron/training.py", line 759, in train
save_checkpoint_and_time(iteration, model, optimizer,
File "/xxx/Megatron-LLaMA/megatron/training.py", line 679, in save_checkpoint_and_time
save_checkpoint(iteration, model, optimizer, opt_param_scheduler)
File "/xxx/Megatron-LLaMA/megatron/checkpointing.py", line 373, in save_checkpoint
optimizer.save_parameter_state(
File "/xxx/Megatron-LLaMA/megatron/optimizer/overlapped_dist_optimizer.py", line 1000, in save_parameter_state
torch.distributed.gather(
File "/xxx/anaconda3/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2540, in gather
work = group.gather(output_tensors, input_tensors, opts)
RuntimeError: Tensors must be CUDA and dense
The text was updated successfully, but these errors were encountered:
Traceback (most recent call last):
File "/xxx/Megatron-LLaMA/pretrain_llama.py", line 119, in
pretrain(train_valid_test_datasets_provider, model_provider,
File "/xxx/Megatron-LLaMA/megatron/training.py", line 153, in pretrain
iteration = train(forward_step_func,
File "/xxx/Megatron-LLaMA/megatron/training.py", line 759, in train
save_checkpoint_and_time(iteration, model, optimizer,
File "/xxx/Megatron-LLaMA/megatron/training.py", line 679, in save_checkpoint_and_time
save_checkpoint(iteration, model, optimizer, opt_param_scheduler)
File "/xxx/Megatron-LLaMA/megatron/checkpointing.py", line 373, in save_checkpoint
optimizer.save_parameter_state(
File "/xxx/Megatron-LLaMA/megatron/optimizer/overlapped_dist_optimizer.py", line 1000, in save_parameter_state
torch.distributed.gather(
File "/xxx/anaconda3/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2540, in gather
work = group.gather(output_tensors, input_tensors, opts)
RuntimeError: Tensors must be CUDA and dense
The text was updated successfully, but these errors were encountered: