local_gpu with distributed training for single instance multi-gpu distributed training

**Describe the bug**
Upon testing for local-session [sagemaker], single instance, multi-gpu distributed training

It fails at 
```
Input
training_instance_type = 'local_gpu', distributions = {'mpi': {'enabled': True, 'processes_per_host': 4}}
```
Stack Trace
```
    def warn_if_parameter_server_with_multi_gpu(training_instance_type, distributions):
        """Warn the user that training will not fully leverage all the GPU
        cores if parameter server is enabled and a multi-GPU instance is selected.
        Distributed training with the default parameter server setup doesn't
        support multi-GPU instances.

        Args:
            training_instance_type (str): A string representing the type of training instance selected.
            distributions (dict): A dictionary with information to enable distributed training.
                (Defaults to None if distributed training is not enabled.) For example:

                .. code:: python

                    {
                        'parameter_server':
                        {
                            'enabled': True
                        }
                    }


        """
        if training_instance_type == "local" or distributions is None:
            return

        is_multi_gpu_instance = (
>           training_instance_type.split(".")[1].startswith("p")
            and training_instance_type not in SINGLE_GPU_INSTANCE_TYPES
        )
E       IndexError: list index out of range

.tox/py37/lib/python3.7/site-packages/sagemaker/fw_utils.py:620: IndexError
````

**To reproduce**
```
tox -e py37 -- tests/integ/test_horovod_mx.py
```
- Build custom docker image :
https://github.com/ChaiBapchya/sagemaker-mxnet-training-toolkit/tree/mx_hvd_mpi
- custom Sagemaker python SDK build from source : https://github.com/ChaiBapchya/sagemaker-python-sdk/tree/mx_estimator_horovod_mpi

**Expected behavior**
For running Distributed training on single instance multi-gpu for mpi-based horovod, I encounter this error.
Since I'm using horovod [mpi] this warning isn't relevant.
I suggest we should also add local_gpu here

```
if training_instance_type in ["local","local_gpu"] or distributions is None:
            return
```
**System information**
A description of your system. Please provide:
- **SageMaker Python SDK version**: Build from source [1.62.1.dev0]
- **Framework name (eg. PyTorch) or algorithm (eg. KMeans)**:MXNet
- **Framework version**:1.6.0
- **Python version**:3
- **CPU or GPU**:GPU
- **Custom Docker image (Y/N)**:Y


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

local_gpu with distributed training for single instance multi-gpu distributed training #1582

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

local_gpu with distributed training for single instance multi-gpu distributed training #1582

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions