You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I just got a problem.
When I run: [torchrun --nproc_per_node 4 train.py --scale small --data_dir ./Data --output_dir ./Results/ --exp_name clip_score_train_results],
I was told that: [from training.distributed import world_info_from_env
ModuleNotFoundError: No module named 'training'],
But I use pip or conda, I still can not have the module.
The text was updated successfully, but these errors were encountered:
Sorry, I have solved that problem above. But another problem still exits.
The problem is :
[W socket.cpp:464] [c10d] The server socket cannot be initialized on [::]:29500 (errno: 97 - Address family not supported by protocol).
[W socket.cpp:697] [c10d] The client socket cannot be initialized to connect to [localhost]:29500
Could you please help to answer it ?
I have the same problem as you (from training.distributed ..... No module named 'training').
Can you tell me how you solved the problem? Would help me a lot :)
Hello, I just got a problem.
When I run: [torchrun --nproc_per_node 4 train.py --scale small --data_dir ./Data --output_dir ./Results/ --exp_name clip_score_train_results],
I was told that: [from training.distributed import world_info_from_env
ModuleNotFoundError: No module named 'training'],
But I use pip or conda, I still can not have the module.
The text was updated successfully, but these errors were encountered: