Allow overwritting action by environment #56

mryellow · 2016-09-02T05:03:34Z

No description provided.

mryellow · 2016-09-02T05:10:21Z

async/NStepQAgent.lua

-      reward, terminal, state = self:takeAction(action)
+      reward, terminal, state, taken = self:takeAction(action)
+      if taken and taken ~= action then
+        self.actions[self.batchIdx] = taken


Could do it this way around, looking more like OneStepQAgent.

if taken and taken ~= action then action = taken end self.actions[self.batchIdx] = action

Kaixhin · 2016-09-02T08:44:50Z

Code looks fine, but what is this trying to achieve? If the chosen action may not be deterministically executed in the environment, the agent should still treat it as if the chosen action were taken, otherwise it will not adapt to the real stochasticity in the application of its action. For an ALE example see Increasing the Action Gap.

mryellow · 2016-09-02T12:38:39Z

Expert instruction is one application. Can forget whatever the net came up with and learn from experiences which have been overwritten with expert play, or some hand-crafted strategy.

Last time I used it was forcing robot to repeat turning on the spot actions at random intervals when epsilon was below a threshold, to escape corners and get some better experiences again.

Kaixhin · 2016-09-02T12:47:26Z

Got it - can you add a short 2nd paragraph to the custom docs to make people aware of this modification from the rlenvs API, along with a use-case as above?

mryellow · 2016-09-02T13:03:53Z

Yeah no problem.

Describing it just now, my method has always been a hack of discarding the agents selection after doing the unneeded work. Could pass a variable back toward agent.observe() much like terminal is used at the start of the loop but defined at the end, something like next_action. Environment doesn't overwrite the current action, but can specify the next. Skipping the forward pass if it's defined.

Doesn't really matter for my robot use-case as I'm throwing away CPU cycles to run at real-time (10Hz) rather than fast as possible, guess it matters when comes to battery drain on hardware.

For real-time expert play, wonder if that next_action approach would introduce a little too much lag, with the teleop really needing to apply for the action to the current cycle.

mryellow · 2016-09-02T13:15:43Z

lag

Not a lot happens before back to start of loop again. next_action might be as good as "do it 'now' given what I see this frame". It could work either way.

mryellow · 2016-09-02T23:27:12Z

Closing in favour of #57

Avoids doing work only to then discard it.

mryellow · 2016-09-03T23:43:14Z

Was thinking of adding the extra return to validation agents, allowing environment to pass back any change during validation also. However nothing is done with the action apart from acting.

Could add the signature so the variable is already there if it's needed in future...

So far prioritised reply hasn't crashed with the code this way around, firing off the observe methods must play into setting up the distribution stratum at some point.

mryellow · 2016-09-03T23:45:18Z

async/AsyncAgent.lua

@@ -91,7 +91,7 @@ function AsyncAgent:takeAction(action)
    self.stateBuffer:push(observation)
  end

-  return reward, terminal, self.stateBuffer:readAll()
+  return reward, terminal, self.stateBuffer:readAll(), actionTaken


Could add the offset actionOffset before returning so it doesn't need adding to compare actionTaken ~= action.

edit: Done

mryellow · 2016-09-19T21:29:15Z

Think I had some additional documentation changes to do on this. Been distracted working on hardware and getting dev environment fixed up, once that's outta the way will get this cleaned up.

mryellow · 2016-09-28T00:24:20Z

In the readme can you add a line about how this doesn’t affect validation async/non-async?

Will get to finishing this off soonish.

https://gitter.im/Kaixhin/Atari?at=57cbda0bd52261ec345029ba

mryellow added 2 commits September 2, 2016 15:02

Allow overwritting action by environment

3e4f75e

Pass taken from environment through AsyncAgent

20308c8

mryellow reviewed Sep 2, 2016
View reviewed changes

mryellow closed this Sep 2, 2016

mryellow reopened this Sep 3, 2016

mryellow added 2 commits September 4, 2016 07:40

Refactor to actionTaken and include actionOffset

a27e66d

Document actionTaken in custom section

741f03f

mryellow reviewed Sep 3, 2016
View reviewed changes

mryellow added 3 commits September 4, 2016 09:52

Add and remove offset in takeAction only

10d6de3

Reusing action variable and recording after takeAction

e188e92

Only record action once

17a80e0

Kaixhin mentioned this pull request Oct 2, 2016

Modify envs to be compatible with twrl Kaixhin/rlenvs#8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow overwritting action by environment #56

Allow overwritting action by environment #56

mryellow commented Sep 2, 2016

mryellow Sep 2, 2016

Kaixhin commented Sep 2, 2016

mryellow commented Sep 2, 2016

Kaixhin commented Sep 2, 2016

mryellow commented Sep 2, 2016

mryellow commented Sep 2, 2016

mryellow commented Sep 2, 2016

mryellow commented Sep 3, 2016

mryellow Sep 3, 2016 •

edited

Loading

mryellow commented Sep 19, 2016

mryellow commented Sep 28, 2016 •

edited

Loading

Allow overwritting action by environment #56

Are you sure you want to change the base?

Allow overwritting action by environment #56

Conversation

mryellow commented Sep 2, 2016

mryellow Sep 2, 2016

Choose a reason for hiding this comment

Kaixhin commented Sep 2, 2016

mryellow commented Sep 2, 2016

Kaixhin commented Sep 2, 2016

mryellow commented Sep 2, 2016

mryellow commented Sep 2, 2016

mryellow commented Sep 2, 2016

mryellow commented Sep 3, 2016

mryellow Sep 3, 2016 • edited Loading

Choose a reason for hiding this comment

mryellow commented Sep 19, 2016

mryellow commented Sep 28, 2016 • edited Loading

mryellow Sep 3, 2016 •

edited

Loading

mryellow commented Sep 28, 2016 •

edited

Loading