Implement Search & Rescue Multi-Agent Environment #259

zombie-einstein · 2024-11-04T10:41:24Z

Add a multi-agent search and rescue environment where a set of agents has to locate moving targets on a 2d space.

Changes

Adds the Esquilax library as a dependency
Adds the swarm environment group/type (was not sure the new environment fit into an existing group, but happy to move if you think it would better fit somewhere else)
Implement some common swarm/flock functionality (can be used if more environments of this type are added)
Implement the search and rescue environment and docs

Todo

Need to add images and animations, was waiting to finalise code before adding.

* Initial prototype * feat: Add environment tests * fix: Update esquilax version to fix type issues * docs: Add docstrings * docs: Add docstrings * test: Test multiple reward types * test: Add smoke tests and add max-steps check * feat: Implement pred-prey environment viewer * refactor: Pull out common viewer functionality * test: Add reward and view tests * test: Add rendering tests and add test docstrings * docs: Add predator-prey environment documentation page * docs: Cleanup docstrings * docs: Cleanup docstrings

CLAassistant · 2024-11-04T10:41:31Z

All committers have signed the CLA.

zombie-einstein · 2024-11-04T10:43:04Z

Here you go @sash-a this is correct now. Will grab a look at the contributor license and Ci failure now.

zombie-einstein · 2024-11-04T10:50:40Z

I think CI issue is I've Esquilax set to Python >=3.10, seems you've a PR open to upgrade Python version, is it worth holding on for that?

sash-a · 2024-11-04T14:37:16Z

Python version PR is merged now so hopefully it will pass 😄

Should have time during the week to review this, really appreciate the contribution!

sash-a

An initial review with some high level comments about jumanji conventions. Will go through it more in depth once these are addressed. In general it's looking really nice and well documented!

Not quite sure on the new swarms package, but also not sure where else we would put it. Not sure on it especially if we only have 1 env and no news ones planned.

One thing I don't quite understand is the benefit of amap over vmap specifically in the case of this env?

Please @ me when it's ready for another review or if you have any questions.

jumanji/environments/swarms/common/types.py

jumanji/environments/swarms/common/updates.py

jumanji/environments/swarms/predator_prey/updates.py

jumanji/environments/swarms/predator_prey/types.py

jumanji/environments/swarms/common/types.py

jumanji/environments/swarms/predator_prey/env.py

sash-a · 2024-11-05T06:52:27Z

As for your questions in the description:

I only forwarded the Environment import to jumanji.environments do types also need forwarding somewhere?

Nope just the environment is fine

I didn't add an animate method to the environment, but saw that some other do? Easy enough to add.

Please do add animation it's a great help.

Do you want defaults for all the environment parameters? Not sure there are really "natural" choices, but could add sensible defaults to avoid some typing.

We do want defaults, I think we can discuss what makes sense.

Are the API docs auto-generated somehow, or do I need to add a link manually?

It's generated with mkdocs, we need an entry in docs/api/environments and mkdocs.yml, see this recently closed PR for an example of which files we change

One big thing I've realized that this is missing after my review is training code. We like to validate that the env works. I'm not 100% sure if this is possible because the env has two teams, so which reward do you optimize, maybe training with simple heuristic, eg you are the predator and the prey moves randomly? For examples see the training folder, you should only need to create a network. An example of this should also be in the above PR.

* refactor: Formatting fixes * fix: Implement rewards as class * refactor: Implement observation as NamedTuple * refactor: Implement initial state generator * docs: Update docstrings * refactor: Add env animate method * docs: Link env into API docs

zombie-einstein · 2024-11-05T14:51:31Z

Hi @sash-a, just merged changes that I think address all the comments, and the animate method, and API docs link.

Not quite sure on the new swarms package, but also not sure where else we would put it. Not sure on it especially if we only have 1 env and no news ones planned.

Could you have something like a multi-agent package? Don't think you have similar at the moment? FYI was intending to add a couple more swarm/flock type envs if this one went ok.

One thing I don't quite understand is the benefit of amap over vmap specifically in the case of this env?

Yeah in a couple cases using it is overkill, hang-over from when I was writing this example with esquilax demo in mind! Makes sense to use vmap instead if the other arguments are not being used.

zombie-einstein · 2024-11-05T14:54:04Z

I'll look at adding something to training next. I think random prey with trained predators makes sense, will look to implement.

sash-a · 2024-11-06T07:50:53Z

Could you have something like a multi-agent package? Don't think you have similar at the moment? FYI was intending to add a couple more swarm/flock type envs if this one went ok.

If you can add more that would be great! Then I'm happy to keep the swarm package as is. What we'd be most interested in is some kind of env with only 1 team and strictly co-operative like predators vs heuristic prey or visa versa, not sure if you planned to make any envs like this?

But I had a quick look at the changes and it mostly looks great! Will leave an in depth review later today/tomorrow 😄

Also I updated the CI yesterday, we're now using ruff, so you will need to update your pre-commit

sash-a · 2024-11-06T08:28:29Z

One other thing, the only reason I've been hesitant to add this to Jumanji is because it's not that related to industry problems which is a common focus between all the envs. I was thinking maybe we could re-frame the env from predator-prey to something else (without changing any code, just changing the idea). I was thinking maybe a continuous cleaner where your target position is changing or something to do with drones (maybe delivery), do you have any other ideas and would you be happy with this?

zombie-einstein · 2024-11-06T11:07:45Z

Could you have something like a multi-agent package? Don't think you have similar at the moment? FYI was intending to add a couple more swarm/flock type envs if this one went ok.

If you can add more that would be great! Then I'm happy to keep the swarm package as is. What we'd be most interested in is some kind of env with only 1 team and strictly co-operative like predators vs heuristic prey or visa versa, not sure if you planned to make any envs like this?

Yeah I was very interested in developing envs for co-operative multi-agent RL so was keen to design or implement more environments along theses lines. There's a simpler version of this environment which is just the flock, i.e. where the agents move in a co-ordinated way with out colliding. Also seen an environment where the agents have to effectively cover an an area that I was going to look at.

Also I updated the CI yesterday, we're now using ruff, so you will need to update your pre-commit

How do I do this? I did try reinstalling pre-commit, but it raised an error that the config was invalid?

zombie-einstein · 2024-11-06T11:09:28Z

One other thing, the only reason I've been hesitant to add this to Jumanji is because it's not that related to industry problems which is a common focus between all the envs. I was thinking maybe we could re-frame the env from predator-prey to something else (without changing any code, just changing the idea). I was thinking maybe a continuous cleaner where your target position is changing or something to do with drones (maybe delivery), do you have any other ideas and would you be happy with this?

Yeah definitely open to suggestions. I was thinking more in the abstract for this (will the agents develop some collective behaviour to avoid predators) but happy to modify towards something more concrete.

sash-a · 2024-11-06T15:47:38Z

Great to hear on the co-operative marl front those both sound like nice envs to have

How do I do this? I did try reinstalling pre-commit, but it raised an error that the config was invalid?

Couple things to try:

pip install -U pre-commit
pre-commit uninstall
pre-commit install

If this doesn't work check which pre-commit it should point to your virtual environment if it's pointing to your system python or some other system folder just uninstall that version and rerun the above.

Yeah definitely open to suggestions. I was thinking more in the abstract for this (will the agents develop some collective behaviour to avoid predators) but happy to modify towards something more concrete.

Agreed it would be nice to keep it abstract for the sake of research, but I think it's nice that this env suite is all industry focused. I quite like something to do with drones - seems quite industry focused although we must definitely avoid anything to do with war. I'll give it a think

zombie-einstein · 2024-11-07T16:07:56Z

Hi @sash-a fixed the formatting and consolidated the predator-prey type.

sash-a · 2024-11-07T16:14:37Z

Thanks I'll try have a look tomorrow, sorry previous 2 days were a bit more busy than expected.

For the theme I'm think maratime search and rescue works well. It's relatively real world and fits the current dynamics

zombie-einstein · 2024-11-07T16:16:24Z

Thanks I'll try have a look tomorrow, sorry previous 2 days were a bit more busy than expected.

For the theme I'm think maratime search and rescue works well. It's relatively real world and fits the current dynamic

Thanks, no worries. Actually yeah funnily enough a co-ordinated search was something I'd been looking into. Yeah could have one set of agent have some drift w random movements that need to be found inside the simulated region.

sash-a · 2024-11-08T15:04:58Z

Sorry still didn't have time to review today and Mondays are usually super busy for me, but I'll get to this next week!

As for the theme do you think we should then change the dynamics a bit to make prey heuristically controlled to move sort of randomly?

zombie-einstein · 2024-11-08T16:25:47Z

Sorry still didn't have time to review today and Mondays are usually super busy for me, but I'll get to this next week!

As for the theme do you think we should then change the dynamics a bit to make prey heuristically controlled to move sort of randomly?

No worries, sure I'll do a revision this weekend!

* feat: Prototype search and rescue environment * test: Add additional tests * docs: Update docs * refactor: Update target plot color based on status * refactor: Formatting and fix remaining typos.

zombie-einstein · 2024-11-11T21:32:46Z

Hi @sash-a, this turned into a larger rewrite (sorry for the extra review work, let me know if you want me to close this PR and just start with a fresh one) but think it's a more realistic scenario

A team of agents is searching for targets in the environment region
Targets are controlled by an fixed update algorithm (that has an interface to allow other behaviours)
Agents are only rewarded the first time a target is located
To detect the target they must come within a fixed range of them.
Agents visualise the local environment, i.e. the location of of other agents in their vicinity.

A couple choices we may want to consider:

Agents are individually rewarded, we could have some interface for reward shaping (to promote co-operation), but could also leave this external to the environment for the user to implement?
At the moment agents only visualise other neighbours. A twist on this I considered was once targets are revealed they are then visualised (i.e. can be seen) by each agent as part of their local view.
Do we want to scale rewards with how quickly targets are found, feels like it would make sense?
I've assigned a fixed number of steps to locate the targets, but also seems it would makes sense to terminate the episode when all located?
As part of the observation I've included the remaining steps and targets as normalised floats, but not sure if you have some convention for values like this (i.e. just use integer values and let use rescale them)

sash-a · 2024-11-12T13:56:29Z

Thanks for this @zombie-einstein I'll start having a look now 😄
I think leave the PR as is, no need to create a new one.

that has an interface to allow other behaviors

awesome!

Agents are only rewarded the first time a target is located

Agreed I think we should actually hide targets once they are located so as to not confuse other agents.

Agents are individually rewarded

I think individual is fine and externally users can sum it outside if they want. e.g we do this in mava for connector

At the moment agents only visualise other neighbours. A twist on this I considered was once targets are revealed they are then visualised (i.e. can be seen) by each agent as part of their local view.

Not quite following what you mean here. I would say an agent should observe all agents and targets (that have not yet been rescued) within their local view.

Do we want to scale rewards with how quickly targets are found, feels like it would make sense?

Maybe add this as an optional reward type, I think I prefer 1 if target is saved and 0 otherwise - makes the env quite hard, but we should test what works best.

I've assigned a fixed number of steps to locate the targets, but also seems it would makes sense to terminate the episode when all located?

Definitely!

As part of the observation I've included the remaining steps and targets as normalised floats, but not sure if you have some convention for values like this (i.e. just use integer values and let use rescale them)

We don't have a convention for this. I wouldn't add remaining steps to the obs directly I don't see why the algorithm would need that, although again needs to be tested. Agreed with remaining targets, makes sense to observe that. I think normalised floats makes sense.

sash-a

Amazing job with this rewrite, haven't had time to fully look at everything but it does look great so far!

Some high level things:

Please add a generator, dynamics and viewer test (see examples of the viewer test for other envs)
Can you also add tests for the common/updates
Can you start looking into the networks and testing for jumanji

Sorry a bit tedious tasks, but I really like the env we've landed on 😄

docs/environments/search_and_rescue.md

jumanji/environments/swarms/common/updates.py

jumanji/environments/swarms/search_and_rescue/env.py

zombie-einstein · 2024-11-12T23:53:47Z

Thanks @sash-a, just a couple follow ups to your questions:

At the moment agents only visualise other neighbours. A twist on this I considered was once targets are revealed they are then visualised (i.e. can be seen) by each agent as part of their local view.

Not quite following what you mean here. I would say an agent should observe all agents and targets (that have not yet been rescued) within their local view.

So I was picturing (and as currently implemented) a situation where the searchers have to come quite close the targets to "find" them (as if they are obscured/hard to find), but the agents have a larger vision range to visualise the location of other searchers agents (to allow them to improve search patterns for example).

My feeling was that this created more of a search task, where if the targets are part of their larger vision range it feels like it could be more of a routing type task.

I then thought it may be good to include found targets in the vision to allow agents to visualise density of located targets.

As part of the observation I've included the remaining steps and targets as normalised floats, but not sure if you have some convention for values like this (i.e. just use integer values and let use rescale them)

We don't have a convention for this. I wouldn't add remaining steps to the obs directly I don't see why the algorithm would need that, although again needs to be tested. Agreed with remaining targets, makes sense to observe that. I think normalised floats makes sense.

I thought if treating it as a time-sensitive task some indication of the remaining time to find targets could be a useful feature of the observation.

Please add a generator, dynamics and viewer test (see examples of the viewer test for other envs)
Can you also add tests for the common/updates
Can you start looking into the networks and testing for jumanji

Yup will do!

zombie-einstein · 2025-01-12T23:25:02Z

Hi @sash-a, think this is ready to look over now. Only remaining task is to add an animated image to the docs.

There's also a mypy bug come up due to recent change in MyPy that was causing CI to fail so I grabbed that here.

sash-a

I haven't looked at the training stuff, does it work or did you end up using mava? If so I think you can remove the training stuff and keep only the random agent

Otherwise just a few questions where I think there might be mistakes. Once this is done and the gif is added I want to try run it locally, what branch do you have it working on mava?

jumanji/environments/swarms/search_and_rescue/reward.py

docs/environments/search_and_rescue.md

jumanji/environments/swarms/common/updates.py

jumanji/environments/swarms/search_and_rescue/dynamics.py

jumanji/environments/swarms/search_and_rescue/env.py

zombie-einstein · 2025-01-14T12:16:39Z

I haven't looked at the training stuff, does it work or did you end up using mava? If so I think you can remove the training stuff and keep only the random agent

The training stuff in this PR works well where all the targets are visible, but I ended up using Mava for the harder problem where targets are only shown once found (so I could use a recursive network and the centralised observation).

Otherwise just a few questions where I think there might be mistakes. Once this is done and the gif is added I want to try run it locally, what branch do you have it working on mava?

I was working off of the branch you started, but I actually meant to ask if you can add me as a contributor to that branch, then I can push changes I've made.

sash-a · 2025-01-15T10:32:25Z

The training stuff in this PR works well where all the targets are visible, but I ended up using Mava for the harder problem where targets are only shown once found (so I could use a recursive network and the centralised observation).

Ok great, I'll review that later this week then.

I was working off of the branch you started, but I actually meant to ask if you can add me as a contributor to that branch, then I can push changes I've made.

Unfortunately I can't add you as a contributor to Mava, but if you fork the repo you should be able to commit your changes to the forked repo

* Tweak env docs * Env tweaks/refactors * Separate searcher and target view ranges and consolidate observation-fn * Consolidate reward functions * Implement randomly accelerating target dynamics * Tweak parameters

zombie-einstein · 2025-01-17T13:01:34Z

Hey @sash-a merged fixes for above comments. Consolidated a lot of the observation and reward functionality, and added in distinct target and agent vision ranges. Need to maybe tweak some of the default parameters but otherwise should be ready to look over.

sash-a · 2025-01-20T08:50:04Z

All these changes look great! Question is how does the learning perform now?

* Remove unused parameter * Add tests and tweak parameters

zombie-einstein · 2025-01-21T10:54:31Z

All these changes look great! Question is how does the learning perform now?

So this is 4 searchers and 40 targets, with a narrower FOV and the smaller target vision range, using rec_ippo in Mava. So they continue to get quicker at completing the task, though the relative increase in the rewards slows.

There's maybe some tweaking to do with the parameters, performance seems to be relatively sensitive to small changes.

With the larger observation space I was actually struggling to get the shared observation for Mappo working effectively.

I'll make sure to push my Mava changes to fork today!

zombie-einstein · 2025-01-21T15:37:23Z

@sash-a my Mava fork is here

zombie-einstein · 2025-01-22T10:55:56Z

rec-mappo (green) working now, but not gain over rec-ippo (yellow)

sash-a · 2025-01-27T07:24:34Z

Thanks @zombie-einstein this looks amazing. Maybe a little easy given how quickly it seems to find the optimal strategy, but it clearly is learning! Honestly having an easy default isn't a bad thing.

I'm pretty much happy with all the code, I just want to try run your fork locally and play around with the difficulty myself. My week is just a bit busy, but I will get to it next week and then we should be able to merge it! 🚀

(Sorry this has taken so long)

zombie-einstein · 2025-01-27T10:14:07Z

Great thanks @sash-a, no worries it's been an interesting process working through it, happy with the final result!

sash-a

Just some minor changes to the distributions and also could you please add the gif and image and update the readme 🙏

jumanji/training/networks/postprocessor.py

jumanji/training/networks/distribution.py

jumanji/training/configs/env/search_and_rescue.yaml

docs/environments/search_and_rescue.md

sash-a · 2025-02-03T15:55:37Z

Two more small requests, can you please use a colour pallet like we do in other envs so that the searchers and targets colours are a bit more complimentary (still keep it 1 colour for all searchers and 1 for all targets)

And then could you also change the colour of the found targets, so 3 colours total from the colour map should look something like:

colormap = matplotlib.cm.get_cmap("hsv", 3)
searcher_color = color_map(0)
target_color = color_map(1)
found_target_color = color_map(2)

(this is untested, but I think that's how it works)

* Sample viewer colours from map, and adjust quiver params * Unwrap distrax distributions * Add search-and-rescue option to comments * Add env animation

zombie-einstein · 2025-02-03T23:48:08Z

Hi @sash-a, think that's all the comments addressed in the last merge

sash-a · 2025-02-04T07:41:44Z

Sorry last bit of admin, I'll do this as it seems I have access to your repo:

Readme table needs to be updated with the search and rescue env
We need an env image in docs/env_img
That image needs to be added to the existing images at the top of the readme

PS: I see the targets were changing colour when they were found already, it must have just not been happening in the random episode I visualized - I prefer the way you had it originally, I'll change it back 😂

zombie-einstein · 2025-02-04T12:12:20Z

PS: I see the targets were changing colour when they were found already, it must have just not been happening in the random episode I visualized - I prefer the way you had it originally, I'll change it back

What was the issue here? I used the updated version to produce the animation, so thought it was working ok?

zombie-einstein

Thanks, just realised I only testing the training with random policy after this change!

sash-a · 2025-02-04T12:45:36Z

What was the issue here? I used the updated version to produce the animation, so thought it was working ok?

There was no issue, I didn't realize that it was changing colours before because I was visualizing a random episode that never found any targets. So I just reverted it to what you had originally and made the colours a little nicer

sash-a

Thanks for the amazing work on this @zombie-einstein it is very much appreciated! 🥇

zombie-einstein · 2025-02-04T13:19:23Z

Thanks for the amazing work on this @zombie-einstein it is very much appreciated! 🥇

Awesome, thanks @sash-a 🎉, thanks for your time an patience also!

I've an idea for another environment that could use quite a few of the mechanics from this, so will open an issue soon(ish).

sash-a · 2025-02-04T14:27:58Z

Awesome hopefully subsequent PRs will be quicker!

zombie-einstein added 2 commits November 4, 2024 10:16

Merge branch 'instadeepai:main' into main

c955320

Merge branch 'main' into main

6b34657

sash-a requested changes Nov 5, 2024

View reviewed changes

fix: PR fixes (#2)

988339b

* refactor: Formatting fixes * fix: Implement rewards as class * refactor: Implement observation as NamedTuple * refactor: Implement initial state generator * docs: Update docstrings * refactor: Add env animate method * docs: Link env into API docs

Merge branch 'instadeepai:main' into main

a0fe7a5

zombie-einstein added 2 commits November 6, 2024 21:27

style: Run updated pre-commit

b4cce01

refactor: Consolidate predator prey type

cb6d88d

feat: Implement search and rescue (#3)

06de3a0

* feat: Prototype search and rescue environment * test: Add additional tests * docs: Update docs * refactor: Update target plot color based on status * refactor: Formatting and fix remaining typos.

sash-a requested changes Nov 12, 2024

View reviewed changes

sash-a reviewed Jan 14, 2025

View reviewed changes

refactor: address pr comments (#16)

943a51b

* Tweak env docs * Env tweaks/refactors * Separate searcher and target view ranges and consolidate observation-fn * Consolidate reward functions * Implement randomly accelerating target dynamics * Tweak parameters

Parameter tweaks

6a3fdb1

refactor: Observation tweaks (#17)

ac8838f

* Remove unused parameter * Add tests and tweak parameters

sash-a reviewed Feb 3, 2025

View reviewed changes

refactor: address pr comments (#18)

05eeedf

* Sample viewer colours from map, and adjust quiver params * Unwrap distrax distributions * Add search-and-rescue option to comments * Add env animation

sash-a added 2 commits February 4, 2025 13:48

chore: revert to using set colours

5353ef7

fix: minor training bug due to refactor

bc9e252

zombie-einstein commented Feb 4, 2025

View reviewed changes

sash-a added 2 commits February 4, 2025 14:28

chore: update default parameters to ones tested in mava

eac2f1f

chore: add search and rescue to the readme

79a7aa8

sash-a approved these changes Feb 4, 2025

View reviewed changes

sash-a merged commit febad96 into instadeepai:main Feb 4, 2025
4 checks passed

Implement Search & Rescue Multi-Agent Environment #259

Implement Search & Rescue Multi-Agent Environment #259

Conversation

zombie-einstein commented Nov 4, 2024 • edited Loading

Changes

Todo

CLAassistant commented Nov 4, 2024 • edited Loading

zombie-einstein commented Nov 4, 2024

zombie-einstein commented Nov 4, 2024

sash-a commented Nov 4, 2024

sash-a left a comment

Choose a reason for hiding this comment

sash-a commented Nov 5, 2024

zombie-einstein commented Nov 5, 2024

zombie-einstein commented Nov 5, 2024

sash-a commented Nov 6, 2024 • edited Loading

sash-a commented Nov 6, 2024 • edited Loading

zombie-einstein commented Nov 6, 2024

zombie-einstein commented Nov 6, 2024

sash-a commented Nov 6, 2024

zombie-einstein commented Nov 7, 2024

sash-a commented Nov 7, 2024

zombie-einstein commented Nov 7, 2024

sash-a commented Nov 8, 2024

zombie-einstein commented Nov 8, 2024

zombie-einstein commented Nov 11, 2024

sash-a commented Nov 12, 2024

sash-a left a comment

Choose a reason for hiding this comment

zombie-einstein commented Nov 12, 2024

zombie-einstein commented Jan 12, 2025

sash-a left a comment

Choose a reason for hiding this comment

zombie-einstein commented Jan 14, 2025

sash-a commented Jan 15, 2025

zombie-einstein commented Jan 17, 2025

sash-a commented Jan 20, 2025

zombie-einstein commented Jan 21, 2025

zombie-einstein commented Jan 21, 2025

zombie-einstein commented Jan 22, 2025

sash-a commented Jan 27, 2025

zombie-einstein commented Jan 27, 2025

sash-a left a comment

Choose a reason for hiding this comment

sash-a commented Feb 3, 2025

zombie-einstein commented Feb 3, 2025

sash-a commented Feb 4, 2025

zombie-einstein commented Feb 4, 2025

zombie-einstein left a comment

Choose a reason for hiding this comment

sash-a commented Feb 4, 2025

sash-a left a comment

Choose a reason for hiding this comment

zombie-einstein commented Feb 4, 2025

sash-a commented Feb 4, 2025

zombie-einstein commented Nov 4, 2024 •

edited

Loading

CLAassistant commented Nov 4, 2024 •

edited

Loading

sash-a commented Nov 6, 2024 •

edited

Loading

sash-a commented Nov 6, 2024 •

edited

Loading