You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.
nnimanager.log:
[7/23/2020, 5:28:26 PM] INFO [ 'Datastore initialization done' ]
[7/23/2020, 5:28:26 PM] INFO [ 'RestServer start' ]
[7/23/2020, 5:28:26 PM] INFO [ 'RestServer base port is 10000' ]
[7/23/2020, 5:28:26 PM] INFO [ 'Rest server listening on: http://0.0.0.0:10000' ]
[7/23/2020, 5:28:26 PM] INFO [ 'NNIManager setClusterMetadata, key: aml_config, value: {"subscriptionId":"db9fc1d1-b44e-45a8-902d-8c766c255568","resourceGroup":"japanv100","workspaceName":"japanv100ws"}' ]
[7/23/2020, 5:28:26 PM] INFO [ 'NNIManager setClusterMetadata, key: nni_manager_ip, value: {"nniManagerIp":"10.190.175.223"}' ]
[7/23/2020, 5:28:26 PM] INFO [ 'NNIManager setClusterMetadata, key: trial_config, value: {"command":"python3 mnist.py","codeDir":"/home/v-junwsu/mnist-pytorch/.","computeTarget":"japan1gpucl","image":"msranni/nni"}' ]
[7/23/2020, 5:28:26 PM] INFO [ 'Starting experiment: iBu27UTz' ]
[7/23/2020, 5:28:26 PM] INFO [ 'Change NNIManager status from: INITIALIZED to: RUNNING' ]
[7/23/2020, 5:28:27 PM] INFO [ 'Add event listeners' ]
[7/23/2020, 5:28:27 PM] INFO [ 'TrialDispatcher: started channel: AMLCommandChannel' ]
[7/23/2020, 5:28:27 PM] INFO [ 'TrialDispatcher: copying code and settings.' ]
[7/23/2020, 5:28:40 PM] INFO [ 'NNIManager received command from dispatcher: ID, ' ]
[7/23/2020, 5:28:40 PM] INFO [ 'NNIManager received command from dispatcher: TR, {"parameter_id": 0, "parameter_source": "algorithm", "parameters": {"batch_size": 128, "hidden_size": 1024, "lr": 0.1, "momentum": 0.4961588510530276}, "parameter_index": 0}' ]
[7/23/2020, 5:28:40 PM] INFO [ 'NNIManager received command from dispatcher: TR, {"parameter_id": 1, "parameter_source": "algorithm", "parameters": {"batch_size": 128, "hidden_size": 512, "lr": 0.0001, "momentum": 0.09524983125988495}, "parameter_index": 0}' ]
[7/23/2020, 5:28:40 PM] INFO [ 'TrialDispatcher: run loop started.' ]
[7/23/2020, 5:28:45 PM] INFO [ 'submitTrialJob: form: {"sequenceId":0,"hyperParameters":{"value":"{\"parameter_id\": 0, \"parameter_source\": \"algorithm\", \"parameters\": {\"batch_size\": 128, \"hidden_size\": 1024, \"lr\": 0.1, \"momentum\": 0.4961588510530276}, \"parameter_index\": 0}","index":0}}' ]
[7/23/2020, 5:28:45 PM] INFO [ 'submitTrialJob: form: {"sequenceId":1,"hyperParameters":{"value":"{\"parameter_id\": 1, \"parameter_source\": \"algorithm\", \"parameters\": {\"batch_size\": 128, \"hidden_size\": 512, \"lr\": 0.0001, \"momentum\": 0.09524983125988495}, \"parameter_index\": 0}","index":0}}' ]
[7/23/2020, 5:28:46 PM] INFO [ 'request new environment, since live trials 2 is more than live environments 0' ]
What issue meet, what's expected?:
When using aml mode, if user forget to az login or forget to install azureml and azureml-sdk before they start the experiment, the experiment can be started normally, and print message like
INFO: expand searchSpacePath: search_space.json to /home/v-junwsu/mnist-pytorch/search_space.json
INFO: expand codeDir: . to /home/v-junwsu/mnist-pytorch/.
INFO: Starting restful server...
INFO: Successfully started Restful server!
INFO: Setting aml config...
INFO: Successfully set aml config!
INFO: Starting experiment...
INFO: Successfully started experiment!
------------------------------------------------------------------------------------
The experiment id is iBu27UTz
The Web UI urls are: 10.190.175.223:10000
------------------------------------------------------------------------------------
You can use these commands to get more information about the experiment
------------------------------------------------------------------------------------
commands description
1. nnictl experiment show show the information of experiments
2. nnictl trial ls list all of trial jobs
3. nnictl top monitor the status of running experiments
4. nnictl log stderr show stderr log content
5. nnictl log stdout show stdout log content
6. nnictl stop stop an experiment
7. nnictl trial kill kill a trial job by id
8. nnictl --help get help information about nnictl
------------------------------------------------------------------------------------
Command reference document https://nni.readthedocs.io/en/latest/Tutorial/Nnictl.html
------------------------------------------------------------------------------------
However, the experiment will crash soon and leaving an experiment dir in ~/nni/experiments with no trial in it. Should we display a more friendly message in this situation?
The text was updated successfully, but these errors were encountered:
Log message:
[7/23/2020, 5:28:26 PM] INFO [ 'Datastore initialization done' ]
[7/23/2020, 5:28:26 PM] INFO [ 'RestServer start' ]
[7/23/2020, 5:28:26 PM] INFO [ 'RestServer base port is 10000' ]
[7/23/2020, 5:28:26 PM] INFO [ 'Rest server listening on: http://0.0.0.0:10000' ]
[7/23/2020, 5:28:26 PM] INFO [ 'NNIManager setClusterMetadata, key: aml_config, value: {"subscriptionId":"db9fc1d1-b44e-45a8-902d-8c766c255568","resourceGroup":"japanv100","workspaceName":"japanv100ws"}' ]
[7/23/2020, 5:28:26 PM] INFO [ 'NNIManager setClusterMetadata, key: nni_manager_ip, value: {"nniManagerIp":"10.190.175.223"}' ]
[7/23/2020, 5:28:26 PM] INFO [ 'NNIManager setClusterMetadata, key: trial_config, value: {"command":"python3 mnist.py","codeDir":"/home/v-junwsu/mnist-pytorch/.","computeTarget":"japan1gpucl","image":"msranni/nni"}' ]
[7/23/2020, 5:28:26 PM] INFO [ 'Starting experiment: iBu27UTz' ]
[7/23/2020, 5:28:26 PM] INFO [ 'Change NNIManager status from: INITIALIZED to: RUNNING' ]
[7/23/2020, 5:28:27 PM] INFO [ 'Add event listeners' ]
[7/23/2020, 5:28:27 PM] INFO [ 'TrialDispatcher: started channel: AMLCommandChannel' ]
[7/23/2020, 5:28:27 PM] INFO [ 'TrialDispatcher: copying code and settings.' ]
[7/23/2020, 5:28:40 PM] INFO [ 'NNIManager received command from dispatcher: ID, ' ]
[7/23/2020, 5:28:40 PM] INFO [ 'NNIManager received command from dispatcher: TR, {"parameter_id": 0, "parameter_source": "algorithm", "parameters": {"batch_size": 128, "hidden_size": 1024, "lr": 0.1, "momentum": 0.4961588510530276}, "parameter_index": 0}' ]
[7/23/2020, 5:28:40 PM] INFO [ 'NNIManager received command from dispatcher: TR, {"parameter_id": 1, "parameter_source": "algorithm", "parameters": {"batch_size": 128, "hidden_size": 512, "lr": 0.0001, "momentum": 0.09524983125988495}, "parameter_index": 0}' ]
[7/23/2020, 5:28:40 PM] INFO [ 'TrialDispatcher: run loop started.' ]
[7/23/2020, 5:28:45 PM] INFO [ 'submitTrialJob: form: {"sequenceId":0,"hyperParameters":{"value":"{\"parameter_id\": 0, \"parameter_source\": \"algorithm\", \"parameters\": {\"batch_size\": 128, \"hidden_size\": 1024, \"lr\": 0.1, \"momentum\": 0.4961588510530276}, \"parameter_index\": 0}","index":0}}' ]
[7/23/2020, 5:28:45 PM] INFO [ 'submitTrialJob: form: {"sequenceId":1,"hyperParameters":{"value":"{\"parameter_id\": 1, \"parameter_source\": \"algorithm\", \"parameters\": {\"batch_size\": 128, \"hidden_size\": 512, \"lr\": 0.0001, \"momentum\": 0.09524983125988495}, \"parameter_index\": 0}","index":0}}' ]
[7/23/2020, 5:28:46 PM] INFO [ 'request new environment, since live trials 2 is more than live environments 0' ]
What issue meet, what's expected?:
When using aml mode, if user forget to
az login
or forget to installazureml
andazureml-sdk
before they start the experiment, the experiment can be started normally, and print message likeHowever, the experiment will crash soon and leaving an experiment dir in ~/nni/experiments with no trial in it. Should we display a more friendly message in this situation?
The text was updated successfully, but these errors were encountered: