Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow environment to set next action #57

Closed
wants to merge 2 commits into from

Conversation

mryellow
Copy link
Contributor

@mryellow mryellow commented Sep 2, 2016

No description provided.

@mryellow
Copy link
Contributor Author

mryellow commented Sep 2, 2016

This might also help with Hierarchical DQN #9, ignoring the complexities with storing skill state in experiences.

The environment could be executing a skill network and feeding back to the core that it is repeating the skill action (or the individual actions being taken as part of the skill if desired).

@mryellow
Copy link
Contributor Author

mryellow commented Sep 2, 2016

p.s. This isn't implemented for validation agents. For my use-case it makes sense to have them not using any hard-coded behaviours but giving metrics on what has actually been learnt.

For Hierarchical DQN or any kind of skill execution type setup you'd want the validation agents to act in the same way, with environment having some control in what is being scored. Perhaps that is best done in the environment, with validation agent being detected somehow and told to work differently if so desired.

@mryellow
Copy link
Contributor Author

mryellow commented Sep 3, 2016

https://github.com/mryellow/dqn_assets/blob/85a90375f349b37399a6f2ecf2d47ac25f697f66/rlenvs/Kulbabu.lua#L185-L195

An implementation, if agent hasn't moved "greatly" within "reasonable" time, repeat either a left or right turn for a number of seconds.

edit: Updated it to only run when training. p.s. Noticed training evaluate aren't documented in rlenvs readme.

@Kaixhin
Copy link
Owner

Kaixhin commented Sep 3, 2016

I realise that this is more efficient than overwriting an action the agent has already decided upon, but worry that it's too restrictive. This forcing is conditional on the state, but the previous approach was conditional on both the state and the action.

For a real example that I am working with, take a crane game. The agent should be allowed to move as it wants, but we would like to stop it from executing the expensive action of dropping the claw if it's far from its target.

@mryellow mryellow closed this Sep 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants