We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimum-habana 1.13.2 +-----------------------------------------------------------------------------+ | HL-SMI Version: hl-1.17.1-fw-51.5.0 | | Driver Version: 1.17.1-78932ae |
examples
download Qwen1.5-14B weight from: https://huggingface.co/Qwen/Qwen1.5-14B
https://huggingface.co/Qwen/Qwen1.5-14B
cd optimum-habana/examples/language-modeling
python ../gaudi_spawn.py \ --world_size 8 --use_deepspeed run_clm.py \ --model_name_or_path /data/models/Qwen1.5-7B-Chat/ \ --dataset_name wikitext \ --dataset_config_name wikitext-2-raw-v1 \ --per_device_train_batch_size 6 \ --per_device_eval_batch_size 4 \ --do_train \ --do_eval \ --output_dir /tmp/test-clm-xl-1 \ --gaudi_config_name ./gaudi_config.json \ --use_habana \ --logging_steps 1 \ --use_lazy_mode \ --gradient_checkpointing \ --use_hpu_graphs_for_inference \ --throughput_warmup_steps 3 \ --overwrite_output_dir \ --deepspeed ./llama2_ds_zero3_config.json
The running error log is as follows:
[2024-09-17 07:57:31,077] [INFO] [checkpointing.py:542:forward] Activation Checkpointing Information [2024-09-17 07:57:31,078] [INFO] [checkpointing.py:543:forward] ----Partition Activations False, CPU CHECKPOINTING False [2024-09-17 07:57:31,078] [INFO] [checkpointing.py:544:forward] ----contiguous Memory Checkpointing False with None total layers [2024-09-17 07:57:31,078] [INFO] [checkpointing.py:546:forward] ----Synchronization False [2024-09-17 07:57:31,078] [INFO] [checkpointing.py:547:forward] ----Profiling time in checkpointing False [rank3]: Traceback (most recent call last): [rank3]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 695, in <module> [rank3]: main() [rank3]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 641, in main [rank3]: train_result = trainer.train(resume_from_checkpoint=checkpoint) [rank3]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 553, in train [rank3]: return inner_training_loop( [rank3]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 978, in _inner_training_loop [rank3]: tr_loss_step = self.training_step(model, inputs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 1575, in training_step [rank3]: loss = self.compute_loss(model, inputs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3363, in compute_loss [rank3]: outputs = model(**inputs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank3]: return self._call_impl(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1544, in _call_impl [rank3]: return forward_call(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank3]: ret_val = func(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1885, in forward [rank3]: loss = self.module(*inputs, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank3]: return self._call_impl(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank3]: result = forward_call(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 789, in forward [rank3]: outputs = self.model( [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank3]: return self._call_impl(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank3]: result = forward_call(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 677, in forward [rank3]: layer_outputs = self._gradient_checkpointing_func( [rank3]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 692, in hpu_deepspeed_checkpointing [rank3]: CheckpointFunction.apply(function, all_outputs, *checkpoint_args) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 598, in apply [rank3]: return super().apply(*args, **kwargs) # type: ignore[misc] [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 568, in forward [rank3]: outputs = run_function(*inputs_cuda) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank3]: return self._call_impl(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank3]: result = forward_call(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 464, in forward [rank3]: hidden_states, self_attn_weights, present_key_value = self.pre_attn( [rank3]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 515, in pre_attn [rank3]: hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward( [rank3]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 401, in pre_attn_forward [rank3]: attn_output = self.o_proj(attn_output) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank3]: return self._call_impl(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1574, in _call_impl [rank3]: args_result = hook(self, args) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank3]: ret_val = func(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 278, in _pre_forward_module_hook [rank3]: self.pre_sub_module_forward_function(module) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank3]: return func(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 452, in pre_sub_module_forward_function [rank3]: param_coordinator.fetch_sub_module(sub_module, forward=True) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank3]: return fn(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank3]: ret_val = func(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank3]: return func(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 290, in fetch_sub_module [rank3]: self.__all_gather_params(params_to_fetch, forward) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank3]: ret_val = func(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 434, in __all_gather_params [rank3]: self.__all_gather_params_(nonquantized_params, forward, quantize=self.zero_quantized_weights) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 463, in __all_gather_params_ [rank3]: handle = param_group[0].all_gather_coalesced(param_group, quantize=quantize) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank3]: ret_val = func(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 1241, in all_gather_coalesced [rank3]: handles = _dist_allgather_fn( [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 95, in _dist_allgather_fn [rank3]: return instrument_w_nvtx(dist.allgather_fn)(output_tensor, input_tensor, group=group, async_op=True) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank3]: ret_val = func(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 320, in allgather_fn [rank3]: return all_gather_into_tensor(output_tensor, input_tensor, group=group, async_op=async_op, debug=debug) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 117, in log_wrapper [rank3]: return func(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 305, in all_gather_into_tensor [rank3]: return cdb.all_gather_into_tensor(output_tensor=output_tensor, input_tensor=tensor, group=group, async_op=async_op) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank3]: return fn(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/torch.py", line 218, in all_gather_into_tensor [rank3]: return self.all_gather_function(output_tensor=output_tensor, [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 75, in wrapper [rank3]: return func(*args, **kwargs) [rank3]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 2949, in all_gather_into_tensor [rank3]: work = group._allgather_base(output_tensor, input_tensor, opts) [rank3]: RuntimeError: Graph compile failed. synStatus=synStatus 26 [Generic failure]. [rank6]: Traceback (most recent call last): [rank6]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 695, in <module> [rank6]: main() [rank6]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 641, in main [rank6]: train_result = trainer.train(resume_from_checkpoint=checkpoint) [rank6]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 553, in train [rank6]: return inner_training_loop( [rank6]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 978, in _inner_training_loop [rank6]: tr_loss_step = self.training_step(model, inputs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 1575, in training_step [rank6]: loss = self.compute_loss(model, inputs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3363, in compute_loss [rank6]: outputs = model(**inputs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank6]: return self._call_impl(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1544, in _call_impl [rank6]: return forward_call(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank6]: ret_val = func(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1885, in forward [rank6]: loss = self.module(*inputs, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank6]: return self._call_impl(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank6]: result = forward_call(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 789, in forward [rank6]: outputs = self.model( [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank6]: return self._call_impl(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank6]: result = forward_call(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 677, in forward [rank6]: layer_outputs = self._gradient_checkpointing_func( [rank6]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 692, in hpu_deepspeed_checkpointing [rank6]: CheckpointFunction.apply(function, all_outputs, *checkpoint_args) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 598, in apply [rank6]: return super().apply(*args, **kwargs) # type: ignore[misc] [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 568, in forward [rank6]: outputs = run_function(*inputs_cuda) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank6]: return self._call_impl(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank6]: result = forward_call(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 464, in forward [rank6]: hidden_states, self_attn_weights, present_key_value = self.pre_attn( [rank6]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 515, in pre_attn [rank6]: hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward( [rank6]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 401, in pre_attn_forward [rank6]: attn_output = self.o_proj(attn_output) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank6]: return self._call_impl(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1574, in _call_impl [rank6]: args_result = hook(self, args) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank6]: ret_val = func(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 278, in _pre_forward_module_hook [rank6]: self.pre_sub_module_forward_function(module) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank6]: return func(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 452, in pre_sub_module_forward_function [rank6]: param_coordinator.fetch_sub_module(sub_module, forward=True) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank6]: return fn(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank6]: ret_val = func(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank6]: return func(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 290, in fetch_sub_module [rank6]: self.__all_gather_params(params_to_fetch, forward) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank6]: ret_val = func(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 434, in __all_gather_params [rank6]: self.__all_gather_params_(nonquantized_params, forward, quantize=self.zero_quantized_weights) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 463, in __all_gather_params_ [rank6]: handle = param_group[0].all_gather_coalesced(param_group, quantize=quantize) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank6]: ret_val = func(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 1241, in all_gather_coalesced [rank6]: handles = _dist_allgather_fn( [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 95, in _dist_allgather_fn [rank6]: return instrument_w_nvtx(dist.allgather_fn)(output_tensor, input_tensor, group=group, async_op=True) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank6]: ret_val = func(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 320, in allgather_fn [rank6]: return all_gather_into_tensor(output_tensor, input_tensor, group=group, async_op=async_op, debug=debug) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 117, in log_wrapper [rank6]: return func(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 305, in all_gather_into_tensor [rank6]: return cdb.all_gather_into_tensor(output_tensor=output_tensor, input_tensor=tensor, group=group, async_op=async_op) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank6]: return fn(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/torch.py", line 218, in all_gather_into_tensor [rank6]: return self.all_gather_function(output_tensor=output_tensor, [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 75, in wrapper [rank6]: return func(*args, **kwargs) [rank6]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 2949, in all_gather_into_tensor [rank6]: work = group._allgather_base(output_tensor, input_tensor, opts) [rank6]: RuntimeError: Graph compile failed. synStatus=synStatus 26 [Generic failure]. [rank1]: Traceback (most recent call last): [rank1]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 695, in <module> [rank1]: main() [rank1]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 641, in main [rank1]: train_result = trainer.train(resume_from_checkpoint=checkpoint) [rank1]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 553, in train [rank1]: return inner_training_loop( [rank1]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 978, in _inner_training_loop [rank1]: tr_loss_step = self.training_step(model, inputs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 1575, in training_step [rank1]: loss = self.compute_loss(model, inputs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3363, in compute_loss [rank1]: outputs = model(**inputs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1544, in _call_impl [rank1]: return forward_call(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank1]: ret_val = func(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1885, in forward [rank1]: loss = self.module(*inputs, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank1]: result = forward_call(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 789, in forward [rank1]: outputs = self.model( [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank1]: result = forward_call(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 677, in forward [rank1]: layer_outputs = self._gradient_checkpointing_func( [rank1]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 692, in hpu_deepspeed_checkpointing [rank1]: CheckpointFunction.apply(function, all_outputs, *checkpoint_args) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 598, in apply [rank1]: return super().apply(*args, **kwargs) # type: ignore[misc] [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 568, in forward [rank1]: outputs = run_function(*inputs_cuda) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank1]: result = forward_call(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 464, in forward [rank1]: hidden_states, self_attn_weights, present_key_value = self.pre_attn( [rank1]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 515, in pre_attn [rank1]: hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward( [rank1]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 401, in pre_attn_forward [rank1]: attn_output = self.o_proj(attn_output) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1574, in _call_impl [rank1]: args_result = hook(self, args) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank1]: ret_val = func(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 278, in _pre_forward_module_hook [rank1]: self.pre_sub_module_forward_function(module) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank1]: return func(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 452, in pre_sub_module_forward_function [rank1]: param_coordinator.fetch_sub_module(sub_module, forward=True) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank1]: return fn(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank1]: ret_val = func(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank1]: return func(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 290, in fetch_sub_module [rank1]: self.__all_gather_params(params_to_fetch, forward) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank1]: ret_val = func(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 434, in __all_gather_params [rank1]: self.__all_gather_params_(nonquantized_params, forward, quantize=self.zero_quantized_weights) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 463, in __all_gather_params_ [rank1]: handle = param_group[0].all_gather_coalesced(param_group, quantize=quantize) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank1]: ret_val = func(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 1241, in all_gather_coalesced [rank1]: handles = _dist_allgather_fn( [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 95, in _dist_allgather_fn [rank1]: return instrument_w_nvtx(dist.allgather_fn)(output_tensor, input_tensor, group=group, async_op=True) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank1]: ret_val = func(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 320, in allgather_fn [rank1]: return all_gather_into_tensor(output_tensor, input_tensor, group=group, async_op=async_op, debug=debug) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 117, in log_wrapper [rank1]: return func(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 305, in all_gather_into_tensor [rank1]: return cdb.all_gather_into_tensor(output_tensor=output_tensor, input_tensor=tensor, group=group, async_op=async_op) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank1]: return fn(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/torch.py", line 218, in all_gather_into_tensor [rank1]: return self.all_gather_function(output_tensor=output_tensor, [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 75, in wrapper [rank1]: return func(*args, **kwargs) [rank1]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 2949, in all_gather_into_tensor [rank1]: work = group._allgather_base(output_tensor, input_tensor, opts) [rank1]: RuntimeError: Graph compile failed. synStatus=synStatus 26 [Generic failure]. [rank4]: Traceback (most recent call last): [rank4]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 695, in <module> [rank4]: main() [rank4]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 641, in main [rank4]: train_result = trainer.train(resume_from_checkpoint=checkpoint) [rank4]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 553, in train [rank4]: return inner_training_loop( [rank4]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 978, in _inner_training_loop [rank4]: tr_loss_step = self.training_step(model, inputs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 1575, in training_step [rank4]: loss = self.compute_loss(model, inputs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3363, in compute_loss [rank4]: outputs = model(**inputs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank4]: return self._call_impl(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1544, in _call_impl [rank4]: return forward_call(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank4]: ret_val = func(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1885, in forward [rank4]: loss = self.module(*inputs, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank4]: return self._call_impl(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank4]: result = forward_call(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 789, in forward [rank4]: outputs = self.model( [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank4]: return self._call_impl(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank4]: result = forward_call(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 677, in forward [rank4]: layer_outputs = self._gradient_checkpointing_func( [rank4]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 692, in hpu_deepspeed_checkpointing [rank4]: CheckpointFunction.apply(function, all_outputs, *checkpoint_args) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 598, in apply [rank4]: return super().apply(*args, **kwargs) # type: ignore[misc] [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 568, in forward [rank4]: outputs = run_function(*inputs_cuda) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank4]: return self._call_impl(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank4]: result = forward_call(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 464, in forward [rank4]: hidden_states, self_attn_weights, present_key_value = self.pre_attn( [rank4]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 515, in pre_attn [rank4]: hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward( [rank4]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 401, in pre_attn_forward [rank4]: attn_output = self.o_proj(attn_output) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank4]: return self._call_impl(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1574, in _call_impl [rank4]: args_result = hook(self, args) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank4]: ret_val = func(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 278, in _pre_forward_module_hook [rank4]: self.pre_sub_module_forward_function(module) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank4]: return func(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 452, in pre_sub_module_forward_function [rank4]: param_coordinator.fetch_sub_module(sub_module, forward=True) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank4]: return fn(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank4]: ret_val = func(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank4]: return func(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 290, in fetch_sub_module [rank4]: self.__all_gather_params(params_to_fetch, forward) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank4]: ret_val = func(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 434, in __all_gather_params [rank4]: self.__all_gather_params_(nonquantized_params, forward, quantize=self.zero_quantized_weights) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 463, in __all_gather_params_ [rank4]: handle = param_group[0].all_gather_coalesced(param_group, quantize=quantize) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank4]: ret_val = func(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 1241, in all_gather_coalesced [rank4]: handles = _dist_allgather_fn( [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 95, in _dist_allgather_fn [rank4]: return instrument_w_nvtx(dist.allgather_fn)(output_tensor, input_tensor, group=group, async_op=True) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank4]: ret_val = func(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 320, in allgather_fn [rank4]: return all_gather_into_tensor(output_tensor, input_tensor, group=group, async_op=async_op, debug=debug) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 117, in log_wrapper [rank4]: return func(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 305, in all_gather_into_tensor [rank4]: return cdb.all_gather_into_tensor(output_tensor=output_tensor, input_tensor=tensor, group=group, async_op=async_op) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank4]: return fn(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/torch.py", line 218, in all_gather_into_tensor [rank4]: return self.all_gather_function(output_tensor=output_tensor, [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 75, in wrapper [rank4]: return func(*args, **kwargs) [rank4]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 2949, in all_gather_into_tensor [rank4]: work = group._allgather_base(output_tensor, input_tensor, opts) [rank4]: RuntimeError: Graph compile failed. synStatus=synStatus 26 [Generic failure]. [rank5]: Traceback (most recent call last): [rank5]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 695, in <module> [rank5]: main() [rank5]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 641, in main [rank5]: train_result = trainer.train(resume_from_checkpoint=checkpoint) [rank5]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 553, in train [rank5]: return inner_training_loop( [rank5]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 978, in _inner_training_loop [rank5]: tr_loss_step = self.training_step(model, inputs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 1575, in training_step [rank5]: loss = self.compute_loss(model, inputs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3363, in compute_loss [rank5]: outputs = model(**inputs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank5]: return self._call_impl(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1544, in _call_impl [rank5]: return forward_call(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank5]: ret_val = func(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1885, in forward [rank5]: loss = self.module(*inputs, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank5]: return self._call_impl(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank5]: result = forward_call(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 789, in forward [rank5]: outputs = self.model( [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank5]: return self._call_impl(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank5]: result = forward_call(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 677, in forward [rank5]: layer_outputs = self._gradient_checkpointing_func( [rank5]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 692, in hpu_deepspeed_checkpointing [rank5]: CheckpointFunction.apply(function, all_outputs, *checkpoint_args) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 598, in apply [rank5]: return super().apply(*args, **kwargs) # type: ignore[misc] [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 568, in forward [rank5]: outputs = run_function(*inputs_cuda) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank5]: return self._call_impl(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank5]: result = forward_call(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 464, in forward [rank5]: hidden_states, self_attn_weights, present_key_value = self.pre_attn( [rank5]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 515, in pre_attn [rank5]: hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward( [rank5]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 401, in pre_attn_forward [rank5]: attn_output = self.o_proj(attn_output) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank5]: return self._call_impl(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1574, in _call_impl [rank5]: args_result = hook(self, args) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank5]: ret_val = func(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 278, in _pre_forward_module_hook [rank5]: self.pre_sub_module_forward_function(module) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank5]: return func(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 452, in pre_sub_module_forward_function [rank5]: param_coordinator.fetch_sub_module(sub_module, forward=True) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank5]: return fn(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank5]: ret_val = func(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank5]: return func(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 290, in fetch_sub_module [rank5]: self.__all_gather_params(params_to_fetch, forward) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank5]: ret_val = func(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 434, in __all_gather_params [rank5]: self.__all_gather_params_(nonquantized_params, forward, quantize=self.zero_quantized_weights) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 463, in __all_gather_params_ [rank5]: handle = param_group[0].all_gather_coalesced(param_group, quantize=quantize) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank5]: ret_val = func(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 1241, in all_gather_coalesced [rank5]: handles = _dist_allgather_fn( [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 95, in _dist_allgather_fn [rank5]: return instrument_w_nvtx(dist.allgather_fn)(output_tensor, input_tensor, group=group, async_op=True) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank5]: ret_val = func(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 320, in allgather_fn [rank5]: return all_gather_into_tensor(output_tensor, input_tensor, group=group, async_op=async_op, debug=debug) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 117, in log_wrapper [rank5]: return func(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 305, in all_gather_into_tensor [rank5]: return cdb.all_gather_into_tensor(output_tensor=output_tensor, input_tensor=tensor, group=group, async_op=async_op) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank5]: return fn(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/torch.py", line 218, in all_gather_into_tensor [rank5]: return self.all_gather_function(output_tensor=output_tensor, [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 75, in wrapper [rank5]: return func(*args, **kwargs) [rank5]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 2949, in all_gather_into_tensor [rank5]: work = group._allgather_base(output_tensor, input_tensor, opts) [rank5]: RuntimeError: Graph compile failed. synStatus=synStatus 26 [Generic failure]. [rank7]: Traceback (most recent call last): [rank7]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 695, in <module> [rank7]: main() [rank7]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 641, in main [rank7]: train_result = trainer.train(resume_from_checkpoint=checkpoint) [rank7]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 553, in train [rank7]: return inner_training_loop( [rank7]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 978, in _inner_training_loop [rank7]: tr_loss_step = self.training_step(model, inputs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 1575, in training_step [rank7]: loss = self.compute_loss(model, inputs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3363, in compute_loss [rank7]: outputs = model(**inputs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank7]: return self._call_impl(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1544, in _call_impl [rank7]: return forward_call(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank7]: ret_val = func(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1885, in forward [rank7]: loss = self.module(*inputs, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank7]: return self._call_impl(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank7]: result = forward_call(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 789, in forward [rank7]: outputs = self.model( [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank7]: return self._call_impl(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank7]: result = forward_call(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 677, in forward [rank7]: layer_outputs = self._gradient_checkpointing_func( [rank7]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 692, in hpu_deepspeed_checkpointing [rank7]: CheckpointFunction.apply(function, all_outputs, *checkpoint_args) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 598, in apply [rank7]: return super().apply(*args, **kwargs) # type: ignore[misc] [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 568, in forward [rank7]: outputs = run_function(*inputs_cuda) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank7]: return self._call_impl(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank7]: result = forward_call(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 464, in forward [rank7]: hidden_states, self_attn_weights, present_key_value = self.pre_attn( [rank7]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 515, in pre_attn [rank7]: hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward( [rank7]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 401, in pre_attn_forward [rank7]: attn_output = self.o_proj(attn_output) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank7]: return self._call_impl(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1574, in _call_impl [rank7]: args_result = hook(self, args) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank7]: ret_val = func(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 278, in _pre_forward_module_hook [rank7]: self.pre_sub_module_forward_function(module) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank7]: return func(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 452, in pre_sub_module_forward_function [rank7]: param_coordinator.fetch_sub_module(sub_module, forward=True) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank7]: return fn(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank7]: ret_val = func(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank7]: return func(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 290, in fetch_sub_module [rank7]: self.__all_gather_params(params_to_fetch, forward) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank7]: ret_val = func(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 434, in __all_gather_params [rank7]: self.__all_gather_params_(nonquantized_params, forward, quantize=self.zero_quantized_weights) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 463, in __all_gather_params_ [rank7]: handle = param_group[0].all_gather_coalesced(param_group, quantize=quantize) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank7]: ret_val = func(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 1241, in all_gather_coalesced [rank7]: handles = _dist_allgather_fn( [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 95, in _dist_allgather_fn [rank7]: return instrument_w_nvtx(dist.allgather_fn)(output_tensor, input_tensor, group=group, async_op=True) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank7]: ret_val = func(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 320, in allgather_fn [rank7]: return all_gather_into_tensor(output_tensor, input_tensor, group=group, async_op=async_op, debug=debug) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 117, in log_wrapper [rank7]: return func(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 305, in all_gather_into_tensor [rank7]: return cdb.all_gather_into_tensor(output_tensor=output_tensor, input_tensor=tensor, group=group, async_op=async_op) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank7]: return fn(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/torch.py", line 218, in all_gather_into_tensor [rank7]: return self.all_gather_function(output_tensor=output_tensor, [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 75, in wrapper [rank7]: return func(*args, **kwargs) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 2949, in all_gather_into_tensor [rank7]: work = group._allgather_base(output_tensor, input_tensor, opts) [rank7]: RuntimeError: Graph compile failed. synStatus=synStatus 26 [Generic failure]. [rank0]: Traceback (most recent call last): [rank0]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 695, in <module> [rank0]: main() [rank0]: File "/home/jane/optimum-habana/examples/language-modeling/run_clm.py", line 641, in main [rank0]: train_result = trainer.train(resume_from_checkpoint=checkpoint) [rank0]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 553, in train [rank0]: return inner_training_loop( [rank0]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 978, in _inner_training_loop [rank0]: tr_loss_step = self.training_step(model, inputs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 1575, in training_step [rank0]: loss = self.compute_loss(model, inputs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3363, in compute_loss [rank0]: outputs = model(**inputs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1544, in _call_impl [rank0]: return forward_call(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank0]: ret_val = func(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1885, in forward [rank0]: loss = self.module(*inputs, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank0]: result = forward_call(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 789, in forward [rank0]: outputs = self.model( [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank0]: result = forward_call(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 677, in forward [rank0]: layer_outputs = self._gradient_checkpointing_func( [rank0]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/trainer.py", line 692, in hpu_deepspeed_checkpointing [rank0]: CheckpointFunction.apply(function, all_outputs, *checkpoint_args) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 598, in apply [rank0]: return super().apply(*args, **kwargs) # type: ignore[misc] [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 568, in forward [rank0]: outputs = run_function(*inputs_cuda) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1585, in _call_impl [rank0]: result = forward_call(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 464, in forward [rank0]: hidden_states, self_attn_weights, present_key_value = self.pre_attn( [rank0]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 515, in pre_attn [rank0]: hidden_states, attn_weights, present_key_value = self.self_attn.pre_attn_forward( [rank0]: File "/usr/local/lib/python3.10/dist-packages/optimum/habana/transformers/models/qwen2/modeling_qwen2.py", line 401, in pre_attn_forward [rank0]: attn_output = self.o_proj(attn_output) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1535, in _wrapped_call_impl [rank0]: return self._call_impl(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1574, in _call_impl [rank0]: args_result = hook(self, args) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank0]: ret_val = func(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 278, in _pre_forward_module_hook [rank0]: self.pre_sub_module_forward_function(module) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank0]: return func(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/parameter_offload.py", line 452, in pre_sub_module_forward_function [rank0]: param_coordinator.fetch_sub_module(sub_module, forward=True) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank0]: return fn(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank0]: ret_val = func(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context [rank0]: return func(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 290, in fetch_sub_module [rank0]: self.__all_gather_params(params_to_fetch, forward) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank0]: ret_val = func(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 434, in __all_gather_params [rank0]: self.__all_gather_params_(nonquantized_params, forward, quantize=self.zero_quantized_weights) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 463, in __all_gather_params_ [rank0]: handle = param_group[0].all_gather_coalesced(param_group, quantize=quantize) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank0]: ret_val = func(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 1241, in all_gather_coalesced [rank0]: handles = _dist_allgather_fn( [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/partition_parameters.py", line 95, in _dist_allgather_fn [rank0]: return instrument_w_nvtx(dist.allgather_fn)(output_tensor, input_tensor, group=group, async_op=True) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn [rank0]: ret_val = func(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 320, in allgather_fn [rank0]: return all_gather_into_tensor(output_tensor, input_tensor, group=group, async_op=async_op, debug=debug) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 117, in log_wrapper [rank0]: return func(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/comm.py", line 305, in all_gather_into_tensor [rank0]: return cdb.all_gather_into_tensor(output_tensor=output_tensor, input_tensor=tensor, group=group, async_op=async_op) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 451, in _fn [rank0]: return fn(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/deepspeed/comm/torch.py", line 218, in all_gather_into_tensor [rank0]: return self.all_gather_function(output_tensor=output_tensor, [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/c10d_logger.py", line 75, in wrapper [rank0]: return func(*args, **kwargs) [rank0]: File "/usr/local/lib/python3.10/dist-packages/torch/distributed/distributed_c10d.py", line 2949, in all_gather_into_tensor [rank0]: work = group._allgather_base(output_tensor, input_tensor, opts) [rank0]: RuntimeError: Graph compile failed. synStatus=synStatus 26 [Generic failure].
Can successfully run Qwen1.5-14B with full parameter fine-tuning.
The text was updated successfully, but these errors were encountered:
I can reproduce it, cc @libinta
Sorry, something went wrong.
@Zjq9409 Have you tried qwen finetune from examples/trl side?
No branches or pull requests
System Info
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
download Qwen1.5-14B weight from:
https://huggingface.co/Qwen/Qwen1.5-14B
The running error log is as follows:
Expected behavior
Can successfully run Qwen1.5-14B with full parameter fine-tuning.
The text was updated successfully, but these errors were encountered: