Goal conditioning grid world : Example of goal conditioning #5193

vincentpierre · 2021-03-29T20:53:44Z

Proposed change(s)

Making GridWorld use the new goal conditioning.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

chriselion · 2021-03-29T21:54:59Z

Project/Assets/ML-Agents/Examples/GridWorld/Scripts/GridAgent.cs

        m_ResetParams = Academy.Instance.EnvironmentParameters;
    }

+    public override void CollectObservations(VectorSensor sensor)
+    {
+        Array values = Enum.GetValues(typeof(GridGoal));


Can this happen somewhere else? It feels like abuse of CollectObservations(), since it's not touching the input VectorSensor.

VectorSensor is null here, I do not see an issue with this. Goal Signal is an observation, so it makes sense to me that it is called in CollectObservation.
Would it be better if I put this logic into a CollectGoal method with no arguments that I call in CollectObservations ?

CollectGoal is maybe for the example (but let's not add it Agent). Let me think about a better way.

One problem (which I didn't realize until now) is that we don't check for null CollectObservationsSensor during the normal update step:

ml-agents/com.unity.ml-agents/Runtime/Agent.cs

Line 1062 in e4e9c51

CollectObservations(collectObservationsSensor);

but we do check for null when the agent is done:

ml-agents/com.unity.ml-agents/Runtime/Agent.cs

Lines 563 to 571 in e4e9c51

if (collectObservationsSensor != null)

{

// Make sure the latest observations are being passed to training.

collectObservationsSensor.Reset();

using (m_CollectObservationsChecker.Start())

{

CollectObservations(collectObservationsSensor);

}

}

chriselion · 2021-03-29T22:23:29Z

Project/Assets/ML-Agents/Examples/GridWorld/Scripts/GridAgent.cs

@@ -105,17 +147,29 @@ public override void OnActionReceived(ActionBuffers actionBuffers)

            if (hit.Where(col => col.gameObject.CompareTag("goal")).ToArray().Length == 1)
            {
-                SetReward(1f);
+                ProvideReward(GridGoal.Plus);


This is pretty confusing since the "goal" tag doesn't really mean that it's the goal anymore. Can you change them to e.g. "plus" and "ex"?

Or maybe this would be a good opportunity to stop using physics collision checks, and change the example to use a 2D array of enums? That would probably speed up training too.

I changed the tags to plus and ex. I think making the grid a 2D array of enums is a good idea, but out of scope for this.

chriselion · 2021-03-29T22:28:09Z

(sorry, can't comment inline for the file removal). gridworld.png is still being referenced:

$ git grep gridworld.png 
docs/Learning-Environment-Design-Agents.md:![Agent RenderTexture Debug](images/gridworld.png)
docs/Learning-Environment-Examples.md:![GridWorld](images/gridworld.png)

(and that should have failed the link checker)

config/ppo/GridWorld.yaml

vincentpierre · 2021-03-29T23:32:48Z

gridworld.png is still being referenced

gridworld.png is still there (It is only smaller)

ervteng · 2021-03-29T23:49:07Z

Might not be related to this PR, but should we add a warning in the docs about using hypernetworks for larger hidden_units values? We might even be able to auto-detect it in settings.py, e.g. if the resulting model will be bigger than 50mb print a warning

vincentpierre · 2021-03-30T00:18:26Z

Might not be related to this PR, but should we add a warning in the docs about using hypernetworks for larger hidden_units values? We might even be able to auto-detect it in settings.py, e.g. if the resulting model will be bigger than 50mb print a warning

There is this line in the documentation:

If set to `hyper` (default) a [HyperNetwork](https://arxiv.org/pdf/1609.09106.pdf)
will be used to generate some of the
weights of the policy using the goal observations as input. Note that using a
HyperNetwork requires a lot of computations, it is recommended to use a smaller
number of hidden units in the policy to alleviate this.

I am hesitant to throw a warning if the model is going to be large because we never know what the user has in mind...

docs/Learning-Environment-Examples.md

ervteng · 2021-03-31T16:41:41Z

docs/Learning-Environment-Examples.md

@@ -82,16 +82,16 @@ you would like to contribute environments, please see our

 ![GridWorld](images/gridworld.png)


Possible to link to this environment in the goal signal docs and the Changelog? Just in case a user wants an example of how to use these features

Co-authored-by: Ervin T. <[email protected]>

vincentpierre added 5 commits March 29, 2021 13:35

Aded the Goal conditioned GridWorld to replace regular gridworld

639c617

adding missing files

9db76ab

Code improvements

c3dba90

Documentation change on gridworld

9ed6aa1

resolving conflicts

8a1737a

vincentpierre self-assigned this Mar 29, 2021

vincentpierre marked this pull request as ready for review March 29, 2021 20:53

vincentpierre changed the title ~~Goal conditioning grid world 3~~ Goal conditioning grid world : Example of goal conditioning Mar 29, 2021

vincentpierre requested review from ervteng and chriselion March 29, 2021 21:42

chriselion reviewed Mar 29, 2021

View reviewed changes

ervteng reviewed Mar 29, 2021

View reviewed changes

config/ppo/GridWorld.yaml Outdated Show resolved Hide resolved

new model

a3b1f61

vincentpierre and others added 2 commits March 29, 2021 17:27

Addressing comments

ffe56d0

comments and renames

abd2bf0

chriselion approved these changes Mar 31, 2021

View reviewed changes

ervteng reviewed Mar 31, 2021

View reviewed changes

docs/Learning-Environment-Examples.md Outdated Show resolved Hide resolved

ervteng reviewed Mar 31, 2021

View reviewed changes

ervteng approved these changes Mar 31, 2021

View reviewed changes

vincentpierre and others added 2 commits March 31, 2021 09:47

Update docs/Learning-Environment-Examples.md

79fff65

Co-authored-by: Ervin T. <[email protected]>

adding reference to gridworld in docs about goal signal

fb869ac

vincentpierre merged commit 92ff2c2 into main Mar 31, 2021

delete-merged-branch bot deleted the goal-conditioning-grid-world-3 branch March 31, 2021 22:17

github-actions bot locked as resolved and limited conversation to collaborators Apr 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Goal conditioning grid world : Example of goal conditioning #5193

Goal conditioning grid world : Example of goal conditioning #5193

vincentpierre commented Mar 29, 2021

chriselion Mar 29, 2021

vincentpierre Mar 29, 2021

chriselion Mar 29, 2021

chriselion Mar 29, 2021

vincentpierre Mar 30, 2021

chriselion commented Mar 29, 2021

vincentpierre commented Mar 29, 2021

ervteng commented Mar 29, 2021

vincentpierre commented Mar 30, 2021

ervteng Mar 31, 2021

	if (collectObservationsSensor != null)
	{
	// Make sure the latest observations are being passed to training.
	collectObservationsSensor.Reset();
	using (m_CollectObservationsChecker.Start())
	{
	CollectObservations(collectObservationsSensor);
	}
	}

		@@ -82,16 +82,16 @@ you would like to contribute environments, please see our

		![GridWorld](images/gridworld.png)

Goal conditioning grid world : Example of goal conditioning #5193

Goal conditioning grid world : Example of goal conditioning #5193

Conversation

vincentpierre commented Mar 29, 2021

Proposed change(s)

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Other comments

chriselion Mar 29, 2021

Choose a reason for hiding this comment

vincentpierre Mar 29, 2021

Choose a reason for hiding this comment

chriselion Mar 29, 2021

Choose a reason for hiding this comment

chriselion Mar 29, 2021

Choose a reason for hiding this comment

vincentpierre Mar 30, 2021

Choose a reason for hiding this comment

chriselion commented Mar 29, 2021

vincentpierre commented Mar 29, 2021

ervteng commented Mar 29, 2021

vincentpierre commented Mar 30, 2021

ervteng Mar 31, 2021

Choose a reason for hiding this comment