You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i have successful install bitsandbytes 0.43.3 but have a new error.
2024-08-09 10:56:17,298 [INFO] (main) Loading our accelerator...
Modules with uninitialized parameters can't be used with DistributedDataParallel. Run a dummy forward pass to correctly initialize the modules
Traceback (most recent call last):
File "/mnt/data1/wangxi/SimpleTuner/train.py", line 2762, in
main()
File "/mnt/data1/wangxi/SimpleTuner/train.py", line 1341, in main
results = accelerator.prepare(
File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/accelerate/accelerator.py", line 1299, in prepare
result = tuple(
File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/accelerate/accelerator.py", line 1300, in
self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/accelerate/accelerator.py", line 1176, in _prepare_one
return self.prepare_model(obj, device_placement=device_placement)
File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/accelerate/accelerator.py", line 1435, in prepare_model
model = torch.nn.parallel.DistributedDataParallel(
File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 784, in init
self._log_and_throw(
File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1127, in _log_and_throw
raise err_type(err_msg)
RuntimeError: Modules with uninitialized parameters can't be used with DistributedDataParallel. Run a dummy forward pass to correctly initialize the modules
[rank0]:[W809 10:56:25.219082534 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
The text was updated successfully, but these errors were encountered:
Originally posted by @bghira in #676 (comment)
i have successful install bitsandbytes 0.43.3 but have a new error.
2024-08-09 10:56:17,298 [INFO] (main) Loading our accelerator...
Modules with uninitialized parameters can't be used with
DistributedDataParallel
. Run a dummy forward pass to correctly initialize the modulesTraceback (most recent call last):
File "/mnt/data1/wangxi/SimpleTuner/train.py", line 2762, in
main()
File "/mnt/data1/wangxi/SimpleTuner/train.py", line 1341, in main
results = accelerator.prepare(
File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/accelerate/accelerator.py", line 1299, in prepare
result = tuple(
File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/accelerate/accelerator.py", line 1300, in
self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/accelerate/accelerator.py", line 1176, in _prepare_one
return self.prepare_model(obj, device_placement=device_placement)
File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/accelerate/accelerator.py", line 1435, in prepare_model
model = torch.nn.parallel.DistributedDataParallel(
File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 784, in init
self._log_and_throw(
File "/home/wangxi/miniconda3/envs/flux/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1127, in _log_and_throw
raise err_type(err_msg)
RuntimeError: Modules with uninitialized parameters can't be used with
DistributedDataParallel
. Run a dummy forward pass to correctly initialize the modules[rank0]:[W809 10:56:25.219082534 ProcessGroupNCCL.cpp:1168] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
The text was updated successfully, but these errors were encountered: