Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation worker feature #192

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

alex-petrenko
Copy link
Contributor

This adds a new feature that allows real-time evaluation and visualization of agents during the training session.
The evaluation worker is supposed to run in a separate process from the training session and thus enables evaluation on a small number of agents (i.e. 1 or 64) which still leaves enough resources to train on thousands of agents.

This will be triggered in isaacgymenvs using a new flag. Alternatively player can be started with evaluation=True dir_to_monitor=/some/experiment/dir/containing/checkpoints to monitor an existing training session.

Player with evaluation=True will continuously monitor the experiment dir for new checkpoints, load them, and visualize a new policy.

@alex-petrenko
Copy link
Contributor Author

@ViktorM FYI

@Denys88
Copy link
Owner

Denys88 commented Jul 27, 2022

@alex-petrenko will it work with IG on one gpu? as I remember I cannot create second IG on same gpu anyway?

@alex-petrenko
Copy link
Contributor Author

alex-petrenko commented Jul 28, 2022

@Denys88 it worked fine on my 1080Ti provided there's enough memory, although I only tried on my machine.
Is there a fundamental reason why two IG can't coexist?

@alex-petrenko
Copy link
Contributor Author

@ViktorM this is the version we'll need to use for the demo

@Denys88
Copy link
Owner

Denys88 commented Aug 22, 2022

@alex-petrenko please let me know if you are going to add a few more changes or I can just merge it and refactor later.
Btw do you still need #195 this one?

@alex-petrenko
Copy link
Contributor Author

alex-petrenko commented Aug 22, 2022

@Denys88 I think it's solid and works reliably. We were able to use it with both IGE and Omniverse IsaacGym.
It should be rather safe to merge since it does not do anything unless the evaluation flag is turned on.

If you don't want the file monitor thing (watchdog) to be in the main list of dependencies, you can remove it from setup py and add a warning that it should be installed under the evaluation section in the code.

@Denys88 not sure about #195 - this is something @ArthurAllshire should know more about

@ViktorM
Copy link
Collaborator

ViktorM commented Aug 23, 2022

@Denys88 is it good to go?

@Denys88
Copy link
Owner

Denys88 commented Aug 23, 2022

@ViktorM not yet. need to test envpool and ray vecenvs first. and update readme.
you can create a block with a new version.

os.makedirs(self.eval_checkpoint_dir, exist_ok=True)

patterns = ["*.pth"]
from watchdog.observers import Observer
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this logic to the separate file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants