Skip to content

Goal conditioning grid world : Example of goal conditioning #5193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 31, 2021

Conversation

vincentpierre
Copy link
Contributor

Proposed change(s)

Making GridWorld use the new goal conditioning.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

@vincentpierre vincentpierre self-assigned this Mar 29, 2021
@vincentpierre vincentpierre marked this pull request as ready for review March 29, 2021 20:53
@vincentpierre vincentpierre changed the title Goal conditioning grid world 3 Goal conditioning grid world : Example of goal conditioning Mar 29, 2021
m_ResetParams = Academy.Instance.EnvironmentParameters;
}

public override void CollectObservations(VectorSensor sensor)
{
Array values = Enum.GetValues(typeof(GridGoal));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this happen somewhere else? It feels like abuse of CollectObservations(), since it's not touching the input VectorSensor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VectorSensor is null here, I do not see an issue with this. Goal Signal is an observation, so it makes sense to me that it is called in CollectObservation.
Would it be better if I put this logic into a CollectGoal method with no arguments that I call in CollectObservations ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CollectGoal is maybe for the example (but let's not add it Agent). Let me think about a better way.

One problem (which I didn't realize until now) is that we don't check for null CollectObservationsSensor during the normal update step:

CollectObservations(collectObservationsSensor);

but we do check for null when the agent is done:
if (collectObservationsSensor != null)
{
// Make sure the latest observations are being passed to training.
collectObservationsSensor.Reset();
using (m_CollectObservationsChecker.Start())
{
CollectObservations(collectObservationsSensor);
}
}

@@ -105,17 +147,29 @@ public override void OnActionReceived(ActionBuffers actionBuffers)

if (hit.Where(col => col.gameObject.CompareTag("goal")).ToArray().Length == 1)
{
SetReward(1f);
ProvideReward(GridGoal.Plus);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty confusing since the "goal" tag doesn't really mean that it's the goal anymore. Can you change them to e.g. "plus" and "ex"?

Or maybe this would be a good opportunity to stop using physics collision checks, and change the example to use a 2D array of enums? That would probably speed up training too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the tags to plus and ex. I think making the grid a 2D array of enums is a good idea, but out of scope for this.

@chriselion
Copy link
Contributor

(sorry, can't comment inline for the file removal). gridworld.png is still being referenced:

$ git grep gridworld.png 
docs/Learning-Environment-Design-Agents.md:![Agent RenderTexture Debug](images/gridworld.png)
docs/Learning-Environment-Examples.md:![GridWorld](images/gridworld.png)

(and that should have failed the link checker)

@vincentpierre
Copy link
Contributor Author

gridworld.png is still being referenced

gridworld.png is still there (It is only smaller)

@ervteng
Copy link
Contributor

ervteng commented Mar 29, 2021

Might not be related to this PR, but should we add a warning in the docs about using hypernetworks for larger hidden_units values? We might even be able to auto-detect it in settings.py, e.g. if the resulting model will be bigger than 50mb print a warning

@vincentpierre
Copy link
Contributor Author

Might not be related to this PR, but should we add a warning in the docs about using hypernetworks for larger hidden_units values? We might even be able to auto-detect it in settings.py, e.g. if the resulting model will be bigger than 50mb print a warning

There is this line in the documentation:

If set to `hyper` (default) a [HyperNetwork](https://arxiv.org/pdf/1609.09106.pdf)
will be used to generate some of the
weights of the policy using the goal observations as input. Note that using a
HyperNetwork requires a lot of computations, it is recommended to use a smaller
number of hidden units in the policy to alleviate this.

I am hesitant to throw a warning if the model is going to be large because we never know what the user has in mind...

@@ -82,16 +82,16 @@ you would like to contribute environments, please see our

![GridWorld](images/gridworld.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible to link to this environment in the goal signal docs and the Changelog? Just in case a user wants an example of how to use these features

@vincentpierre vincentpierre merged commit 92ff2c2 into main Mar 31, 2021
@delete-merged-branch delete-merged-branch bot deleted the goal-conditioning-grid-world-3 branch March 31, 2021 22:17
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants