-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Goal conditioning grid world : Example of goal conditioning #5193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
m_ResetParams = Academy.Instance.EnvironmentParameters; | ||
} | ||
|
||
public override void CollectObservations(VectorSensor sensor) | ||
{ | ||
Array values = Enum.GetValues(typeof(GridGoal)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this happen somewhere else? It feels like abuse of CollectObservations(), since it's not touching the input VectorSensor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VectorSensor is null here, I do not see an issue with this. Goal Signal is an observation, so it makes sense to me that it is called in CollectObservation.
Would it be better if I put this logic into a CollectGoal
method with no arguments that I call in CollectObservations ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CollectGoal is maybe for the example (but let's not add it Agent). Let me think about a better way.
One problem (which I didn't realize until now) is that we don't check for null CollectObservationsSensor during the normal update step:
ml-agents/com.unity.ml-agents/Runtime/Agent.cs
Line 1062 in e4e9c51
CollectObservations(collectObservationsSensor); |
but we do check for null when the agent is done:
ml-agents/com.unity.ml-agents/Runtime/Agent.cs
Lines 563 to 571 in e4e9c51
if (collectObservationsSensor != null) | |
{ | |
// Make sure the latest observations are being passed to training. | |
collectObservationsSensor.Reset(); | |
using (m_CollectObservationsChecker.Start()) | |
{ | |
CollectObservations(collectObservationsSensor); | |
} | |
} |
@@ -105,17 +147,29 @@ public override void OnActionReceived(ActionBuffers actionBuffers) | |||
|
|||
if (hit.Where(col => col.gameObject.CompareTag("goal")).ToArray().Length == 1) | |||
{ | |||
SetReward(1f); | |||
ProvideReward(GridGoal.Plus); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty confusing since the "goal" tag doesn't really mean that it's the goal anymore. Can you change them to e.g. "plus" and "ex"?
Or maybe this would be a good opportunity to stop using physics collision checks, and change the example to use a 2D array of enums? That would probably speed up training too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the tags to plus and ex. I think making the grid a 2D array of enums is a good idea, but out of scope for this.
(sorry, can't comment inline for the file removal).
(and that should have failed the link checker) |
|
Might not be related to this PR, but should we add a warning in the docs about using hypernetworks for larger |
There is this line in the documentation:
I am hesitant to throw a warning if the model is going to be large because we never know what the user has in mind... |
@@ -82,16 +82,16 @@ you would like to contribute environments, please see our | |||
|
|||
 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible to link to this environment in the goal signal docs and the Changelog? Just in case a user wants an example of how to use these features
Proposed change(s)
Making GridWorld use the new goal conditioning.
Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)
Types of change(s)
Checklist
Other comments