Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] basic dict/tuple support for observations #216

Closed
JadenTravnik opened this issue Nov 3, 2020 · 7 comments · Fixed by #243
Closed

[Feature Request] basic dict/tuple support for observations #216

JadenTravnik opened this issue Nov 3, 2020 · 7 comments · Fixed by #243
Labels
enhancement New feature or request
Milestone

Comments

@JadenTravnik
Copy link

JadenTravnik commented Nov 3, 2020

🚀 Feature

As mentioned in the RoadMap, adding dict/tuple support for observations is a planned feature. This follows from the OpenAI gym api which has Tuple and Dict as possible observation spaces.

Motivation

Currently, stablebaselines3 only supports one (image or a vector) observation. Extending this to Tuple/Dict observations would support for environments which have different inputs of data.

Current Plan

I plan on implementing this feature but I'd like to have some pointers on how to go about it.
Below is my current plan but I'd really like to verify it as a good way forward.

I think that I need to create a child class of RolloutBufferSamples which stores a list/dict of observations rather than a single observation.

However, this may require adding a bool on the rollout_buffer itself so that the conversion to tensor (see on_policy_algorithm.py), can be performed over each element of the list/dict. Its not my favorite approach and I'd like to avoid it if possible.

From here, I think that the other necessary changes would permeate through the repository:

  • add a "CombinedExtractor" in torch_layers.py that can take in multiple observations.
  • add a new Policy for each algorithm to use the new extractors
  • modify util.py and preprocessing.py to handle the new rollout type

Is this a good approach?

@JadenTravnik JadenTravnik added the enhancement New feature or request label Nov 3, 2020
@araffin araffin added this to the v1.1 milestone Nov 3, 2020
@araffin
Copy link
Member

araffin commented Nov 4, 2020

Hello,

Thank you for creating the issue.
First thing first, here you are mentioning the on-policy algorithms only. But we should not forget the off-policy ones (although I'm ok with having two separate PR).

As mentioned in the RoadMap, adding dict/tuple support for observations is a planned feature.

yes for v1.1+. (so after v1.0, which should be released soon).

I think that I need to create a child class of RolloutBufferSamples which stores a list/dict of observations rather than a single observation.

I'm not sure if a child class is needed (as the dict case is more general) but it may be cleaner. A draft PR would help to clarify that point I think. @partiallytyped mentioned separating storage and sampling in #81 (see #81 (comment)).

this may require adding a bool on the rollout_buffer i

why can't you treat both cases with one helper function?

Its not my favorite approach and I'd like to avoid it if possible.

What would you propose instead?

add a "CombinedExtractor" in torch_layers.py that can take in multiple observations.

yes, we need to create some rules too: preprocess each observation and then concatenate the ones that can be concatenated (for instance images with images or 1D vector with 1D vector).

add a new Policy for each algorithm to use the new extractors

this is in theory not needed, as you only need to change the feature_extractor_class when creating the policy, but I'm not sure yet.
I would probably document how to use dict observations but not provide policies. That need to be discussed with other maintainers.

modify util.py and preprocessing.py to handle the new rollout type

you will need to modify the predict method in common/policies.py too.

EDIT: I would definitely treat the Tuple cases as a special case of the Dict (where keys are integers).

@araffin araffin changed the title [Feature Request] basic dict/tuple support for observations [Feature Request] basic dict/tuple support for observations (on-policy) Nov 4, 2020
@JadenTravnik
Copy link
Author

Thanks for the feedback :)

why can't you treat both cases with one helper function?

That's a good idea, converting the observations to tensors can be handled by a util method and then the policy can use them directly.

yes, we need to create some rules too: preprocess each observation and then concatenate the ones that can be concatenated (for instance images with images or 1D vector with 1D vector).

I don't think that this is wanted or possible in all cases. For instance, in the situation where one is using multiple images, they need not be the same dimension and its not clear how one would concatenate them in this case. Additionally, there maybe situations in which one will want to create their own version of a "CombinedExtractor" and want to encode certain 1D vectors separately (e.g., goal conditioned agents). I think that concatenating the observations this way inhibits the full usage of using the Dict/Tuple observations.

After some more browsing, the biggest issue I see is with the Environment Wrappers. Specifically, VecTransposeImage and VecFrameStack for a few reasons:

  • There are multiple checks throughout the codebase which check is_image_space(observation_space) which may have to be replaced by has_image_space(observation_space) to support the Dict/Tuple hierarchy.
  • VecTransposeImage.transpose_image(observation) would need to handle transposing all of the images below a Dict/Tuple observations. Else there would need to be a check in policies.py that calls transpose_image for each image observation.

Perhaps these issues can be avoided by using something akin to the new ObsDictWrapper (I had an old version of develop when I was first looking into this issue :) )

I haven't looked deeper into it yet but I can imagine that such a wrapper would handle stacking and transforming the different "subspaces".

Thoughts?

@araffin
Copy link
Member

araffin commented Nov 12, 2020

For instance, in the situation where one is using multiple images, they need not be the same dimension and its not clear how one would concatenate them in this case.

Good point, however, creating a separate CNN for each image type may be quite costly...

After some more browsing, the biggest issue I see is with the Environment Wrappers. Specifically, VecTransposeImage and VecFrameStack for a few reasons:

short answer: we won't add the support for dict for VecFrameStack at first.
Support of dict obs with VecTransposeImage should be not too difficult to implement, but we can skip it at first (and assume channel first images).
And you are right, we probably need to update the is_image_space checks.

Perhaps these issues can be avoided by using something akin to the new ObsDictWrapper

? ObsDictWrapper is a hack to flatten spaces waiting for full dict support.

@JadenTravnik
Copy link
Author

JadenTravnik commented Nov 16, 2020

So I think I have been able to solve the VecFrameStack and the VecTransposeImage issues. :) I created a CombinedExtractor that for now creates an MLP for each Box observation space that isn't an Image and a CNN for each Box observation that is. The features for all of them are the concatenated together and put through another mlp before given to the policy.

Good point, however, creating a separate CNN for each image type may be quite costly...

One can overwrite this functionality in a custom extractor.

Note: I'll be making a pull request as soon as it goes through an internal audit

I tried to avoid using the ObsDictWrapper so as to keep the functionality of the HER intact and instead focused on how the data itself was being parsed. This leads to about a dozen isinstance(self.observation_space, dict) throughout the code base.

I'm currently making sure that all tests pass. Oddly enough, most of the failed tests are in test_her because the observation_space of the BitFlippingEnvironment is a dict and thus gives TypeError: can only concatenate tuple (not "dict") to tuple. This is odd because I intentionally was working around this HER so it will involve further investigation.

@araffin
Copy link
Member

araffin commented Nov 17, 2020

So I think I have been able to solve the VecFrameStack and the VecTransposeImage issues.

Good to hear =)

Btw, I think we should limit support to only "first-level" dict (so we forbid dict of dict).

Note: I'll be making a pull request as soon as it goes through an internal audit

Ok, anyway, we will discuss it there too ;)

r now creates an MLP for each Box observation space that isn't an Image and a CNN for each Box observation that is.

I would rather concatenate every box and then use only one MLP. Compared to images, Box can always be flatten to a 1D vector.

I tried to avoid using the ObsDictWrapper so as to keep the functionality of the HER intact and instead focused on how the data itself was being parsed.

That's ok, in fact, as soon as we have proper dict support, HER implementation should be rewritten.

This is odd because I intentionally was working around this HER so it will involve further investigation.

So you need to add a check to prevent the env to be wrapped with a ObsDictWrapper but still wrap it for now when using HER.
One easy check is to see if it is a dict obs and the keys "observation", "achieved_goal" and "desired_goal" are there.
In that case, you should use ObsDictWrapper, otherwise, you can use your dict support ;)

@J-Travnik
Copy link
Contributor

@araffin I created the PR but there are still a few things that need to be done to complete the integration but I want to be sure it falls in line with the repo so I am creating the request early.

  1. rewriting HER/HER tests to accommodate these changes as currently all of the HER tests fail
  2. incorporating the new multi_input_envs and multi_input_tests into tests.
  3. running make format, make check-codestyle etc (I haven't been able to get this to run on windows yet)
  4. implementing a combined extractor that combines image of the same size and all vector observations (Box Envs) together.
  5. I don't think I am fully taking advantage of other code throughout the repo (e.g., vec_env/utils.py) that uses OrderedDicts so perhaps there is some overlap to take advantage of there.

I will be working on 2) and 4) for the time being.

@araffin
Copy link
Member

araffin commented Nov 24, 2020

incorporating the new multi_input_envs and multi_input_tests into tests.

After a quick look, I'm not sure we need that much complexity.
You could probably use the BitFlippingEnv together with a small dummy Custom Env.

implementing a combined extractor that combines image of the same size and all vector observations (Box Envs) together.

it would nice to have at least one common for the Box.

After a quick look at your PR, I think it would be better to almost duplicate the buffer classes, because having too many branches (with the if else) is hard to follow.

From now on, I think I will try to write comments in the PR only ;)
I will try to have a deeper look this week.

running make format, make check-codestyle etc (I haven't been able to get this to run on windows yet)

I can run it on my side if you want and push the results. But it would be better for you to setup black and isort at least (you can take a look at the makefile for the details of the commands).

@araffin araffin mentioned this issue Nov 30, 2020
19 tasks
@araffin araffin changed the title [Feature Request] basic dict/tuple support for observations (on-policy) [Feature Request] basic dict/tuple support for observations Dec 1, 2020
@araffin araffin pinned this issue Feb 27, 2021
@araffin araffin unpinned this issue May 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants