diff --git a/frameworks/torch/torch-neuronx/training-troubleshooting.rst b/frameworks/torch/torch-neuronx/training-troubleshooting.rst index edf6efca..289dc0c4 100644 --- a/frameworks/torch/torch-neuronx/training-troubleshooting.rst +++ b/frameworks/torch/torch-neuronx/training-troubleshooting.rst @@ -20,7 +20,7 @@ For setting up EFA that is needed for multi-node training, please see :ref:`setu For XLA-related troubleshooting notes see :ref:`How to debug models in PyTorch Neuron ` and `PyTorch-XLA troubleshooting -guide `__. +guide `__. If your multi-worker training run is interrupted, you may need to kill all the python processes (WARNING: this kills all python processes and