Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

More friendly message for aml mode #2723

Closed
JunweiSUN opened this issue Jul 23, 2020 · 1 comment
Closed

More friendly message for aml mode #2723

JunweiSUN opened this issue Jul 23, 2020 · 1 comment

Comments

@JunweiSUN
Copy link
Contributor

Log message:

  • nnimanager.log:
    [7/23/2020, 5:28:26 PM] INFO [ 'Datastore initialization done' ]
    [7/23/2020, 5:28:26 PM] INFO [ 'RestServer start' ]
    [7/23/2020, 5:28:26 PM] INFO [ 'RestServer base port is 10000' ]
    [7/23/2020, 5:28:26 PM] INFO [ 'Rest server listening on: http://0.0.0.0:10000' ]
    [7/23/2020, 5:28:26 PM] INFO [ 'NNIManager setClusterMetadata, key: aml_config, value: {"subscriptionId":"db9fc1d1-b44e-45a8-902d-8c766c255568","resourceGroup":"japanv100","workspaceName":"japanv100ws"}' ]
    [7/23/2020, 5:28:26 PM] INFO [ 'NNIManager setClusterMetadata, key: nni_manager_ip, value: {"nniManagerIp":"10.190.175.223"}' ]
    [7/23/2020, 5:28:26 PM] INFO [ 'NNIManager setClusterMetadata, key: trial_config, value: {"command":"python3 mnist.py","codeDir":"/home/v-junwsu/mnist-pytorch/.","computeTarget":"japan1gpucl","image":"msranni/nni"}' ]
    [7/23/2020, 5:28:26 PM] INFO [ 'Starting experiment: iBu27UTz' ]
    [7/23/2020, 5:28:26 PM] INFO [ 'Change NNIManager status from: INITIALIZED to: RUNNING' ]
    [7/23/2020, 5:28:27 PM] INFO [ 'Add event listeners' ]
    [7/23/2020, 5:28:27 PM] INFO [ 'TrialDispatcher: started channel: AMLCommandChannel' ]
    [7/23/2020, 5:28:27 PM] INFO [ 'TrialDispatcher: copying code and settings.' ]
    [7/23/2020, 5:28:40 PM] INFO [ 'NNIManager received command from dispatcher: ID, ' ]
    [7/23/2020, 5:28:40 PM] INFO [ 'NNIManager received command from dispatcher: TR, {"parameter_id": 0, "parameter_source": "algorithm", "parameters": {"batch_size": 128, "hidden_size": 1024, "lr": 0.1, "momentum": 0.4961588510530276}, "parameter_index": 0}' ]
    [7/23/2020, 5:28:40 PM] INFO [ 'NNIManager received command from dispatcher: TR, {"parameter_id": 1, "parameter_source": "algorithm", "parameters": {"batch_size": 128, "hidden_size": 512, "lr": 0.0001, "momentum": 0.09524983125988495}, "parameter_index": 0}' ]
    [7/23/2020, 5:28:40 PM] INFO [ 'TrialDispatcher: run loop started.' ]
    [7/23/2020, 5:28:45 PM] INFO [ 'submitTrialJob: form: {"sequenceId":0,"hyperParameters":{"value":"{\"parameter_id\": 0, \"parameter_source\": \"algorithm\", \"parameters\": {\"batch_size\": 128, \"hidden_size\": 1024, \"lr\": 0.1, \"momentum\": 0.4961588510530276}, \"parameter_index\": 0}","index":0}}' ]
    [7/23/2020, 5:28:45 PM] INFO [ 'submitTrialJob: form: {"sequenceId":1,"hyperParameters":{"value":"{\"parameter_id\": 1, \"parameter_source\": \"algorithm\", \"parameters\": {\"batch_size\": 128, \"hidden_size\": 512, \"lr\": 0.0001, \"momentum\": 0.09524983125988495}, \"parameter_index\": 0}","index":0}}' ]
    [7/23/2020, 5:28:46 PM] INFO [ 'request new environment, since live trials 2 is more than live environments 0' ]

What issue meet, what's expected?:
When using aml mode, if user forget to az login or forget to install azureml and azureml-sdk before they start the experiment, the experiment can be started normally, and print message like

INFO:  expand searchSpacePath: search_space.json to /home/v-junwsu/mnist-pytorch/search_space.json 
INFO:  expand codeDir: . to /home/v-junwsu/mnist-pytorch/. 
INFO:  Starting restful server...
INFO:  Successfully started Restful server!
INFO:  Setting aml config...
INFO:  Successfully set aml config!
INFO:  Starting experiment...
INFO:  Successfully started experiment!
------------------------------------------------------------------------------------
The experiment id is iBu27UTz
The Web UI urls are: 10.190.175.223:10000
------------------------------------------------------------------------------------

You can use these commands to get more information about the experiment
------------------------------------------------------------------------------------
         commands                       description
1. nnictl experiment show        show the information of experiments
2. nnictl trial ls               list all of trial jobs
3. nnictl top                    monitor the status of running experiments
4. nnictl log stderr             show stderr log content
5. nnictl log stdout             show stdout log content
6. nnictl stop                   stop an experiment
7. nnictl trial kill             kill a trial job by id
8. nnictl --help                 get help information about nnictl
------------------------------------------------------------------------------------
Command reference document https://nni.readthedocs.io/en/latest/Tutorial/Nnictl.html
------------------------------------------------------------------------------------

However, the experiment will crash soon and leaving an experiment dir in ~/nni/experiments with no trial in it. Should we display a more friendly message in this situation?

@SparkSnail
Copy link
Contributor

#2724

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants