[Feature Request] basic dict/tuple support for observations #216

JadenTravnik · 2020-11-03T19:41:07Z

🚀 Feature

As mentioned in the RoadMap, adding dict/tuple support for observations is a planned feature. This follows from the OpenAI gym api which has Tuple and Dict as possible observation spaces.

Motivation

Currently, stablebaselines3 only supports one (image or a vector) observation. Extending this to Tuple/Dict observations would support for environments which have different inputs of data.

Current Plan

I plan on implementing this feature but I'd like to have some pointers on how to go about it.
Below is my current plan but I'd really like to verify it as a good way forward.

I think that I need to create a child class of RolloutBufferSamples which stores a list/dict of observations rather than a single observation.

However, this may require adding a bool on the rollout_buffer itself so that the conversion to tensor (see on_policy_algorithm.py), can be performed over each element of the list/dict. Its not my favorite approach and I'd like to avoid it if possible.

From here, I think that the other necessary changes would permeate through the repository:

add a "CombinedExtractor" in torch_layers.py that can take in multiple observations.
add a new Policy for each algorithm to use the new extractors
modify util.py and preprocessing.py to handle the new rollout type

Is this a good approach?

The text was updated successfully, but these errors were encountered:

araffin · 2020-11-04T10:29:03Z

Hello,

Thank you for creating the issue.
First thing first, here you are mentioning the on-policy algorithms only. But we should not forget the off-policy ones (although I'm ok with having two separate PR).

As mentioned in the RoadMap, adding dict/tuple support for observations is a planned feature.

yes for v1.1+. (so after v1.0, which should be released soon).

I think that I need to create a child class of RolloutBufferSamples which stores a list/dict of observations rather than a single observation.

I'm not sure if a child class is needed (as the dict case is more general) but it may be cleaner. A draft PR would help to clarify that point I think. @partiallytyped mentioned separating storage and sampling in #81 (see #81 (comment)).

this may require adding a bool on the rollout_buffer i

why can't you treat both cases with one helper function?

Its not my favorite approach and I'd like to avoid it if possible.

What would you propose instead?

add a "CombinedExtractor" in torch_layers.py that can take in multiple observations.

yes, we need to create some rules too: preprocess each observation and then concatenate the ones that can be concatenated (for instance images with images or 1D vector with 1D vector).

add a new Policy for each algorithm to use the new extractors

this is in theory not needed, as you only need to change the feature_extractor_class when creating the policy, but I'm not sure yet.
I would probably document how to use dict observations but not provide policies. That need to be discussed with other maintainers.

modify util.py and preprocessing.py to handle the new rollout type

you will need to modify the predict method in common/policies.py too.

EDIT: I would definitely treat the Tuple cases as a special case of the Dict (where keys are integers).

JadenTravnik · 2020-11-12T17:58:57Z

Thanks for the feedback :)

why can't you treat both cases with one helper function?

That's a good idea, converting the observations to tensors can be handled by a util method and then the policy can use them directly.

yes, we need to create some rules too: preprocess each observation and then concatenate the ones that can be concatenated (for instance images with images or 1D vector with 1D vector).

I don't think that this is wanted or possible in all cases. For instance, in the situation where one is using multiple images, they need not be the same dimension and its not clear how one would concatenate them in this case. Additionally, there maybe situations in which one will want to create their own version of a "CombinedExtractor" and want to encode certain 1D vectors separately (e.g., goal conditioned agents). I think that concatenating the observations this way inhibits the full usage of using the Dict/Tuple observations.

After some more browsing, the biggest issue I see is with the Environment Wrappers. Specifically, VecTransposeImage and VecFrameStack for a few reasons:

There are multiple checks throughout the codebase which check is_image_space(observation_space) which may have to be replaced by has_image_space(observation_space) to support the Dict/Tuple hierarchy.
VecTransposeImage.transpose_image(observation) would need to handle transposing all of the images below a Dict/Tuple observations. Else there would need to be a check in policies.py that calls transpose_image for each image observation.

Perhaps these issues can be avoided by using something akin to the new ObsDictWrapper (I had an old version of develop when I was first looking into this issue :) )

I haven't looked deeper into it yet but I can imagine that such a wrapper would handle stacking and transforming the different "subspaces".

Thoughts?

araffin · 2020-11-12T18:07:12Z

For instance, in the situation where one is using multiple images, they need not be the same dimension and its not clear how one would concatenate them in this case.

Good point, however, creating a separate CNN for each image type may be quite costly...

After some more browsing, the biggest issue I see is with the Environment Wrappers. Specifically, VecTransposeImage and VecFrameStack for a few reasons:

short answer: we won't add the support for dict for VecFrameStack at first.
Support of dict obs with VecTransposeImage should be not too difficult to implement, but we can skip it at first (and assume channel first images).
And you are right, we probably need to update the is_image_space checks.

Perhaps these issues can be avoided by using something akin to the new ObsDictWrapper

? ObsDictWrapper is a hack to flatten spaces waiting for full dict support.

JadenTravnik · 2020-11-16T22:17:13Z

So I think I have been able to solve the VecFrameStack and the VecTransposeImage issues. :) I created a CombinedExtractor that for now creates an MLP for each Box observation space that isn't an Image and a CNN for each Box observation that is. The features for all of them are the concatenated together and put through another mlp before given to the policy.

Good point, however, creating a separate CNN for each image type may be quite costly...

One can overwrite this functionality in a custom extractor.

Note: I'll be making a pull request as soon as it goes through an internal audit

I tried to avoid using the ObsDictWrapper so as to keep the functionality of the HER intact and instead focused on how the data itself was being parsed. This leads to about a dozen isinstance(self.observation_space, dict) throughout the code base.

I'm currently making sure that all tests pass. Oddly enough, most of the failed tests are in test_her because the observation_space of the BitFlippingEnvironment is a dict and thus gives TypeError: can only concatenate tuple (not "dict") to tuple. This is odd because I intentionally was working around this HER so it will involve further investigation.

araffin · 2020-11-17T09:45:19Z

So I think I have been able to solve the VecFrameStack and the VecTransposeImage issues.

Good to hear =)

Btw, I think we should limit support to only "first-level" dict (so we forbid dict of dict).

Note: I'll be making a pull request as soon as it goes through an internal audit

Ok, anyway, we will discuss it there too ;)

r now creates an MLP for each Box observation space that isn't an Image and a CNN for each Box observation that is.

I would rather concatenate every box and then use only one MLP. Compared to images, Box can always be flatten to a 1D vector.

I tried to avoid using the ObsDictWrapper so as to keep the functionality of the HER intact and instead focused on how the data itself was being parsed.

That's ok, in fact, as soon as we have proper dict support, HER implementation should be rewritten.

This is odd because I intentionally was working around this HER so it will involve further investigation.

So you need to add a check to prevent the env to be wrapped with a ObsDictWrapper but still wrap it for now when using HER.
One easy check is to see if it is a dict obs and the keys "observation", "achieved_goal" and "desired_goal" are there.
In that case, you should use ObsDictWrapper, otherwise, you can use your dict support ;)

J-Travnik · 2020-11-24T19:51:20Z

@araffin I created the PR but there are still a few things that need to be done to complete the integration but I want to be sure it falls in line with the repo so I am creating the request early.

rewriting HER/HER tests to accommodate these changes as currently all of the HER tests fail
incorporating the new multi_input_envs and multi_input_tests into tests.
running make format, make check-codestyle etc (I haven't been able to get this to run on windows yet)
implementing a combined extractor that combines image of the same size and all vector observations (Box Envs) together.
I don't think I am fully taking advantage of other code throughout the repo (e.g., vec_env/utils.py) that uses OrderedDicts so perhaps there is some overlap to take advantage of there.

I will be working on 2) and 4) for the time being.

araffin · 2020-11-24T20:01:10Z

incorporating the new multi_input_envs and multi_input_tests into tests.

After a quick look, I'm not sure we need that much complexity.
You could probably use the BitFlippingEnv together with a small dummy Custom Env.

implementing a combined extractor that combines image of the same size and all vector observations (Box Envs) together.

it would nice to have at least one common for the Box.

After a quick look at your PR, I think it would be better to almost duplicate the buffer classes, because having too many branches (with the if else) is hard to follow.

From now on, I think I will try to write comments in the PR only ;)
I will try to have a deeper look this week.

running make format, make check-codestyle etc (I haven't been able to get this to run on windows yet)

I can run it on my side if you want and push the results. But it would be better for you to setup black and isort at least (you can take a look at the makefile for the details of the commands).

JadenTravnik added the enhancement New feature or request label Nov 3, 2020

araffin added this to the v1.1 milestone Nov 3, 2020

araffin changed the title ~~[Feature Request] basic dict/tuple support for observations~~ [Feature Request] basic dict/tuple support for observations (on-policy) Nov 4, 2020

lebrice mentioned this issue Nov 18, 2020

[Enhancement] Remove / refactor code duplicated from openai gym #229

Closed

araffin mentioned this issue Nov 22, 2020

(Probably) Unfair Speed Comparison thu-ml/tianshou#249

Closed

J-Travnik mentioned this issue Nov 24, 2020

Dictionary Observations #243

Merged

18 tasks

araffin mentioned this issue Nov 30, 2020

Implement HER #120

Merged

19 tasks

araffin changed the title ~~[Feature Request] basic dict/tuple support for observations (on-policy)~~ [Feature Request] basic dict/tuple support for observations Dec 1, 2020

araffin pinned this issue Feb 27, 2021

araffin closed this as completed in #243 May 11, 2021

araffin unpinned this issue May 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] basic dict/tuple support for observations #216

[Feature Request] basic dict/tuple support for observations #216

JadenTravnik commented Nov 3, 2020 •

edited

Loading

araffin commented Nov 4, 2020 •

edited

Loading

JadenTravnik commented Nov 12, 2020

araffin commented Nov 12, 2020

JadenTravnik commented Nov 16, 2020 •

edited

Loading

araffin commented Nov 17, 2020

J-Travnik commented Nov 24, 2020

araffin commented Nov 24, 2020

[Feature Request] basic dict/tuple support for observations #216

[Feature Request] basic dict/tuple support for observations #216

Comments

JadenTravnik commented Nov 3, 2020 • edited Loading

🚀 Feature

Motivation

Current Plan

araffin commented Nov 4, 2020 • edited Loading

JadenTravnik commented Nov 12, 2020

araffin commented Nov 12, 2020

JadenTravnik commented Nov 16, 2020 • edited Loading

araffin commented Nov 17, 2020

J-Travnik commented Nov 24, 2020

araffin commented Nov 24, 2020

JadenTravnik commented Nov 3, 2020 •

edited

Loading

araffin commented Nov 4, 2020 •

edited

Loading

JadenTravnik commented Nov 16, 2020 •

edited

Loading