Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems about the competition startkit #9

Open
sufeidechabei opened this issue Sep 1, 2019 · 9 comments
Open

Problems about the competition startkit #9

sufeidechabei opened this issue Sep 1, 2019 · 9 comments
Assignees

Comments

@sufeidechabei
Copy link

sufeidechabei commented Sep 1, 2019

It can run in a docker envinment when I use xvfb-run -s "-ac -screen 0 1280x1024x24" python train.py
But I get the following error when I use the script in competiton startkit repo, I use the bash xvfb-run -s "-ac -screen 0 1280x1024x24" ./utility/evaluate_locally.sh, it will get the following error:

Traceback (most recent call last):
File "/root/anaconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/root/anaconda3/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/root/anaconda3/lib/python3.6/site-packages/minerl/env/malmo.py", line 896, in keep_alive_pyro
InstanceManager.add_keep_alive(os.getpid(), callback)
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 275, in getattr
self._pyroGetMetadata()
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 615, in _pyroGetMetadata
self.__pyroCreateConnection()
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 588, in __pyroCreateConnection
uri = _resolve(self._pyroUri, self._pyroHmacKey)
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 1911, in _resolve
return nameserver.lookup(uri.object)
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 185, in call
return self.__send(self.__name, args, kwargs)
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 476, in _pyroInvoke
raise data # if you see this in your traceback, you should probably inspect the remote traceback as well
Pyro4.errors.NamingError: unknown name: minerl.instance_manager

Traceback (most recent call last):
File "run.py", line 2, in
import train
File "/home/user/competition/train.py", line 14, in
from envs import diamond_env_creator
File "/home/user/competition/envs.py", line 10, in
from hack import minerl
File "/home/user/competition/hack.py", line 74, in
minerl.env.malmo.InstanceManager.get_instance = get_instance
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 291, in setattr
self._pyroGetMetadata()
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 615, in _pyroGetMetadata
self.__pyroCreateConnection()
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 588, in __pyroCreateConnection
uri = _resolve(self._pyroUri, self._pyroHmacKey)
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 1911, in _resolve
return nameserver.lookup(uri.object)
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 185, in call
return self.__send(self.__name, args, kwargs)
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 476, in _pyroInvoke
raise data # if you see this in your traceback, you should probably inspect the remote traceback as well
Pyro4.errors.NamingError: unknown name: minerl.instance_manager
+--- This exception occured remotely (Pyro) - Remote traceback:
| Traceback (most recent call last):
| File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/naming.py", line 91, in lookup
| uri, metadata = self.storage[name]
| KeyError: 'minerl.instance_manager'
|
| During handling of the above exception, another exception occurred:
|
| Traceback (most recent call last):
| File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 1421, in handleRequest
| data = method(*vargs, **kwargs) # this is the actual method call to the Pyro object
| File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/naming.py", line 98, in lookup
| raise NamingError("unknown name: " + name)
| Pyro4.errors.NamingError: unknown name: minerl.instance_manager

@sufeidechabei
Copy link
Author

@MadcowD @skbly7

@skbly7
Copy link
Collaborator

skbly7 commented Sep 2, 2019

It looks like pyro4-ns is misbehaving on your side.

  1. Can you run the ./utility/evaluation_locally.sh along with --verbose flag and share the output logs?
  2. ps aux before & after running ./utility/evaluation_locally.sh

Also, the script name is evaluation_locally.sh and not evaluate_locally.sh right now, can you git pull once as well.

@sufeidechabei
Copy link
Author

Hi

  1. I have used the --verbose, the error information has been posted above. It's the log printed when running the bash ./utility/evaluation_locally.sh --verbose
  2. How to do that, I don't understand it clearly?
  3. I have made a typo (I used evaluation_locally.sh not evaluate_locally.sh), I have also git pull, but it's useless.
    @skbly7

@skbly7
Copy link
Collaborator

skbly7 commented Sep 2, 2019

  1. Ok, is the logs complete? There might be logs something like below which will be helpful:
Not starting broadcast server for localhost.
NS running on localhost:9090 (127.0.0.1)
Warning: HMAC key not set. Anyone can connect to this server!
URI = PYRO:Pyro.NameServer@localhost:9090
Removing the performance directory!
autoproxy? True
Object <class 'minerl.env.malmo.InstanceManager'>:
    uri = PYRO:obj_885117b42aa24f578109d33960e8a186@localhost:39151
    name = minerl.instance_manager
Pyro daemon running.
  1. Sorry, I am assuming you did exec into docker container and running xvfb..... via bash? Just before you run the command run ps aux and after xvfb.... ends.

@skbly7
Copy link
Collaborator

skbly7 commented Sep 2, 2019

Meanwhile, it will be nice to try fresh by running ./utility/docker_evaluation_locally.sh from your system (not inside docker). This command will rebuild the docker image from fresh and start the evaluation. (just in case your current docker image is old)

@sufeidechabei
Copy link
Author

sufeidechabei commented Sep 2, 2019

Thanks, the following is all of the logs, it prints the following information, then it is blocked. :

root@94d903ce99a4:/home/user/competition# xvfb-run -s "-ac -screen 0 1280x1024x24" ./new/evaluation_locally.sh --verbose
Verifying (and downloading) MineRL dataset..
If you do not want to use the data:
run the local evaluation scripts with --no-data
If you want to use your existing download of the data:
make sure your MINERL_DATA_ROOT is set.

Data directory is data
Data verified! A+!
Not starting broadcast server for localhost.
NS running on localhost:9090 (127.0.0.1)
Warning: HMAC key not set. Anyone can connect to this server!
URI = PYRO:Pyro.NameServer@localhost:9090
WARNING:ray.rllib.utils.compression:lz4 not available, disabling sample compression. This will significantly impact RLlib performance. To install lz4, run pip install lz4.
2019-09-02 13:02:43,092 WARNING worker.py:1337 -- WARNING: Not updating worker name since setproctitle is not installed. Install this with pip install setproctitle (or ray[debug]) to enable monitoring of worker processes.
2019-09-02 13:02:43,093 INFO node.py:498 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-09-02_13-02-43_092903_72280/logs.
2019-09-02 13:02:43,210 WARNING services.py:763 -- Redis failed to start, retrying now.
2019-09-02 13:02:43,329 INFO services.py:409 -- Waiting for redis server at 127.0.0.1:23248 to respond...
2019-09-02 13:02:43,448 INFO services.py:409 -- Waiting for redis server at 127.0.0.1:12994 to respond...
2019-09-02 13:02:43,451 INFO services.py:806 -- Starting Redis shard with 10.0 GB max memory.
2019-09-02 13:02:43,477 INFO node.py:512 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-09-02_13-02-43_092903_72280/logs.
2019-09-02 13:02:43,477 WARNING services.py:1298 -- Warning: Capping object memory store to 20.0GB. To increase this further, specify object_store_memory when calling ray.init() or ray start.
2019-09-02 13:02:43,477 WARNING services.py:1323 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67108864 bytes available. This may slow down performance! You may be able to free up space by deleting files in /dev/shm or terminating any running plasma_store_server processes. If you are inside a Docker container, you may need to pass an argument with the flag '--shm-size' to 'docker run'.
2019-09-02 13:02:43,478 INFO services.py:1446 -- Starting the Plasma object store with 20.0 GB memory using /tmp.
2019-09-02 13:02:43,654 WARNING logger.py:139 -- Couldn't import TensorFlow - disabling TensorBoard logging.
2019-09-02 13:02:43,657 WARNING logger.py:233 -- Could not instantiate <class 'ray.tune.logger.TFLogger'> - skipping.
WARNING:ray.rllib.utils.compression:lz4 not available, disabling sample compression. This will significantly impact RLlib performance. To install lz4, run pip install lz4.
INFO:minerl.env.malmo.instance.84a92a:Starting Minecraft process: ['/tmp/tmp8___bykl/Minecraft/launchClient.sh', '-port', '9081', '-env', '-runDir', '/tmp/tmp8___bykl/Minecraft/run']
INFO:minerl.env.malmo.instance.84a92a:Starting process watcher for process 77277 @ localhost:9081
Exception in thread Thread-1:
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/root/anaconda3/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/root/anaconda3/lib/python3.6/site-packages/minerl/env/malmo.py", line 896, in keep_alive_pyro
InstanceManager.add_keep_alive(os.getpid(), callback)
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 275, in getattr
self._pyroGetMetadata()
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 615, in _pyroGetMetadata
self.__pyroCreateConnection()
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 588, in __pyroCreateConnection
uri = _resolve(self._pyroUri, self._pyroHmacKey)
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 1911, in _resolve
return nameserver.lookup(uri.object)
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 185, in call
return self.__send(self.__name, args, kwargs)
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 476, in _pyroInvoke
raise data # if you see this in your traceback, you should probably inspect the remote traceback as well
Pyro4.errors.NamingError: unknown name: minerl.instance_manager

Traceback (most recent call last):
File "run.py", line 2, in
import train
File "/home/user/competition/train.py", line 14, in
from envs import diamond_env_creator
File "/home/user/competition/envs.py", line 10, in
from hack import minerl
File "/home/user/competition/hack.py", line 74, in
minerl.env.malmo.InstanceManager.get_instance = get_instance
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 291, in setattr
self._pyroGetMetadata()
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 615, in _pyroGetMetadata
self.__pyroCreateConnection()
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 588, in __pyroCreateConnection
uri = _resolve(self._pyroUri, self._pyroHmacKey)
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 1911, in _resolve
return nameserver.lookup(uri.object)
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 185, in call
return self.__send(self.__name, args, kwargs)
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 476, in _pyroInvoke
raise data # if you see this in your traceback, you should probably inspect the remote traceback as well
Pyro4.errors.NamingError: unknown name: minerl.instance_manager
+--- This exception occured remotely (Pyro) - Remote traceback:
| Traceback (most recent call last):
| File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/naming.py", line 91, in lookup
| uri, metadata = self.storage[name]
| KeyError: 'minerl.instance_manager'
|
| During handling of the above exception, another exception occurred:
|
| Traceback (most recent call last):
| File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 1421, in handleRequest
| data = method(*vargs, **kwargs) # this is the actual method call to the Pyro object
| File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/naming.py", line 98, in lookup
| raise NamingError("unknown name: " + name)
| Pyro4.errors.NamingError: unknown name: minerl.instance_manager
+--- End of remote traceback
./new/evaluation_locally.sh: line 43: kill: (72357) - No such process
root@94d903ce99a4:/home/user/competition# INFO:minerl.env.malmo:Minecraft process psutil.Process(pid=77279, status='terminated') terminated with exit code None
INFO:minerl.env.malmo:Minecraft process psutil.Process(pid=77277, status='terminated') terminated with exit code None
^C
root@94d903ce99a4:/home/user/competition# ^C
root@94d903ce99a4:/home/user/competition# ps aux xvfb-run -s "-ac -screen 0 1280x1024x24" ./new/evaluation_locally.sh --verbose
error: unsupported option (BSD syntax)

Usage:
ps [options]

Try 'ps --help <simple|list|output|threads|misc|all>'
or 'ps --help <s|l|o|t|m|a>'
for additional help text.

For more details see ps(1).
root@94d903ce99a4:/home/user/competition# xvfb-run -s "-ac -screen 0 1280x1024x24" ./new/evaluation_locally.sh --verbose ps aux
Verifying (and downloading) MineRL dataset..
If you do not want to use the data:
run the local evaluation scripts with --no-data
If you want to use your existing download of the data:
make sure your MINERL_DATA_ROOT is set.

Data directory is data
Data verified! A+!
Not starting broadcast server for localhost.
NS running on localhost:9090 (127.0.0.1)
Warning: HMAC key not set. Anyone can connect to this server!
URI = PYRO:Pyro.NameServer@localhost:9090
WARNING:ray.rllib.utils.compression:lz4 not available, disabling sample compression. This will significantly impact RLlib performance. To install lz4, run pip install lz4.
2019-09-02 13:17:51,657 WARNING worker.py:1337 -- WARNING: Not updating worker name since setproctitle is not installed. Install this with pip install setproctitle (or ray[debug]) to enable monitoring of worker processes.
2019-09-02 13:17:51,658 INFO node.py:498 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-09-02_13-17-51_658243_77480/logs.
2019-09-02 13:17:51,775 INFO services.py:409 -- Waiting for redis server at 127.0.0.1:44695 to respond...
2019-09-02 13:17:51,893 INFO services.py:409 -- Waiting for redis server at 127.0.0.1:18180 to respond...
2019-09-02 13:17:51,895 INFO services.py:806 -- Starting Redis shard with 10.0 GB max memory.
2019-09-02 13:17:51,930 INFO node.py:512 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-09-02_13-17-51_658243_77480/logs.
2019-09-02 13:17:51,933 WARNING services.py:1298 -- Warning: Capping object memory store to 20.0GB. To increase this further, specify object_store_memory when calling ray.init() or ray start.
2019-09-02 13:17:51,933 WARNING services.py:1323 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67108864 bytes available. This may slow down performance! You may be able to free up space by deleting files in /dev/shm or terminating any running plasma_store_server processes. If you are inside a Docker container, you may need to pass an argument with the flag '--shm-size' to 'docker run'.
2019-09-02 13:17:51,933 INFO services.py:1446 -- Starting the Plasma object store with 20.0 GB memory using /tmp.
2019-09-02 13:17:52,183 WARNING logger.py:139 -- Couldn't import TensorFlow - disabling TensorBoard logging.
2019-09-02 13:17:52,185 WARNING logger.py:233 -- Could not instantiate <class 'ray.tune.logger.TFLogger'> - skipping.
WARNING:ray.rllib.utils.compression:lz4 not available, disabling sample compression. This will significantly impact RLlib performance. To install lz4, run pip install lz4.
INFO:minerl.env.malmo.instance.b4f065:Starting Minecraft process: ['/tmp/tmp7trapicr/Minecraft/launchClient.sh', '-port', '9081', '-env', '-runDir', '/tmp/tmp7trapicr/Minecraft/run']
INFO:minerl.env.malmo.instance.b4f065:Starting process watcher for process 82476 @ localhost:9081
Exception in thread Thread-1:
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/root/anaconda3/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/root/anaconda3/lib/python3.6/site-packages/minerl/env/malmo.py", line 896, in keep_alive_pyro
InstanceManager.add_keep_alive(os.getpid(), callback)
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 275, in getattr
self._pyroGetMetadata()
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 615, in _pyroGetMetadata
self.__pyroCreateConnection()
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 588, in __pyroCreateConnection
uri = _resolve(self._pyroUri, self._pyroHmacKey)
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 1911, in _resolve
return nameserver.lookup(uri.object)
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 185, in call
return self.__send(self.__name, args, kwargs)
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 476, in _pyroInvoke
raise data # if you see this in your traceback, you should probably inspect the remote traceback as well
Pyro4.errors.NamingError: unknown name: minerl.instance_manager

Traceback (most recent call last):
File "run.py", line 2, in
import train
File "/home/user/competition/train.py", line 14, in
from envs import diamond_env_creator
File "/home/user/competition/envs.py", line 10, in
from hack import minerl
File "/home/user/competition/hack.py", line 74, in
minerl.env.malmo.InstanceManager.get_instance = get_instance
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 291, in setattr
self._pyroGetMetadata()
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 615, in _pyroGetMetadata
self.__pyroCreateConnection()
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 588, in __pyroCreateConnection
uri = _resolve(self._pyroUri, self._pyroHmacKey)
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 1911, in _resolve
return nameserver.lookup(uri.object)
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 185, in call
return self.__send(self.__name, args, kwargs)
File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 476, in _pyroInvoke
raise data # if you see this in your traceback, you should probably inspect the remote traceback as well
Pyro4.errors.NamingError: unknown name: minerl.instance_manager
+--- This exception occured remotely (Pyro) - Remote traceback:
| Traceback (most recent call last):
| File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/naming.py", line 91, in lookup
| uri, metadata = self.storage[name]
| KeyError: 'minerl.instance_manager'
|
| During handling of the above exception, another exception occurred:
|
| Traceback (most recent call last):
| File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/core.py", line 1421, in handleRequest
| data = method(*vargs, **kwargs) # this is the actual method call to the Pyro object
| File "/root/anaconda3/lib/python3.6/site-packages/Pyro4/naming.py", line 98, in lookup
| raise NamingError("unknown name: " + name)
| Pyro4.errors.NamingError: unknown name: minerl.instance_manager
+--- End of remote traceback
./new/evaluation_locally.sh: line 43: kill: (77557) - No such process
./new/evaluation_locally.sh: line 44: 77479 Terminated pyro4-ns
root@94d903ce99a4:/home/user/competition# INFO:minerl.env.malmo:Minecraft process psutil.Process(pid=82478, status='terminated') terminated with exit code None
INFO:minerl.env.malmo:Minecraft process psutil.Process(pid=82476, status='terminated') terminated with exit code None
@skbly7

@sufeidechabei
Copy link
Author

I suggest that your team should give an example on how to use the startkit (You can use PPO, DQN or other baselines). I think it's not very clear. You can also claim the code format of every block. @skbly7

@sufeidechabei
Copy link
Author

I don't know how to fix that bug@skbly7

@sufeidechabei
Copy link
Author

When I use the docker_evaluation_locally.sh, there is still having bug. Here is the error log:

Note: Gathering environment variables from environ.sh
Building docker image, for skipping docker image build use "--no-build"
usage: aicrowd-repo2docker [-h] [--config CONFIG] [--json-logs]
[--image-name IMAGE_NAME] [--ref REF] [--debug]
[--no-build]
[--build-memory-limit BUILD_MEMORY_LIMIT]
[--no-run] [--publish PORTS] [--publish-all]
[--no-clean] [--push] [--volume VOLUMES]
[--user-id USER_ID] [--user-name USER_NAME]
[--env ENVIRONMENT] [--editable]
[--target-repo-dir TARGET_REPO_DIR]
[--appendix APPENDIX] [--subdir SUBDIR] [--version]
[--cache-from CACHE_FROM]
repo ...
aicrowd-repo2docker: error: argument --image-name: 'aicrowdneurips2019-minerl-challenge\r:agent\r' is not a valid docker image name. Image namemust start with an alphanumeric character andcan then use _ . or - in addition to alphanumeric.
To run your submission with nvidia drivers locally, use "--nvidia" with this script
--verbose
docker: invalid reference format: repository name must be lowercase.
See 'docker run --help'. @skbly7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants