-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] basic dict/tuple support for observations #216
Comments
Hello, Thank you for creating the issue.
yes for v1.1+. (so after v1.0, which should be released soon).
I'm not sure if a child class is needed (as the dict case is more general) but it may be cleaner. A draft PR would help to clarify that point I think. @partiallytyped mentioned separating storage and sampling in #81 (see #81 (comment)).
why can't you treat both cases with one helper function?
What would you propose instead?
yes, we need to create some rules too: preprocess each observation and then concatenate the ones that can be concatenated (for instance images with images or 1D vector with 1D vector).
this is in theory not needed, as you only need to change the
you will need to modify the predict method in EDIT: I would definitely treat the Tuple cases as a special case of the Dict (where keys are integers). |
Thanks for the feedback :)
That's a good idea, converting the observations to tensors can be handled by a util method and then the policy can use them directly.
I don't think that this is wanted or possible in all cases. For instance, in the situation where one is using multiple images, they need not be the same dimension and its not clear how one would concatenate them in this case. Additionally, there maybe situations in which one will want to create their own version of a "CombinedExtractor" and want to encode certain 1D vectors separately (e.g., goal conditioned agents). I think that concatenating the observations this way inhibits the full usage of using the Dict/Tuple observations. After some more browsing, the biggest issue I see is with the Environment Wrappers. Specifically,
Perhaps these issues can be avoided by using something akin to the new ObsDictWrapper (I had an old version of develop when I was first looking into this issue :) ) I haven't looked deeper into it yet but I can imagine that such a wrapper would handle stacking and transforming the different "subspaces". Thoughts? |
Good point, however, creating a separate CNN for each image type may be quite costly...
short answer: we won't add the support for dict for
? |
So I think I have been able to solve the VecFrameStack and the VecTransposeImage issues. :) I created a CombinedExtractor that for now creates an MLP for each Box observation space that isn't an Image and a CNN for each Box observation that is. The features for all of them are the concatenated together and put through another mlp before given to the policy.
One can overwrite this functionality in a custom extractor. Note: I'll be making a pull request as soon as it goes through an internal audit I tried to avoid using the I'm currently making sure that all tests pass. Oddly enough, most of the failed tests are in |
Good to hear =) Btw, I think we should limit support to only "first-level" dict (so we forbid dict of dict).
Ok, anyway, we will discuss it there too ;)
I would rather concatenate every box and then use only one MLP. Compared to images, Box can always be flatten to a 1D vector.
That's ok, in fact, as soon as we have proper dict support, HER implementation should be rewritten.
So you need to add a check to prevent the env to be wrapped with a |
@araffin I created the PR but there are still a few things that need to be done to complete the integration but I want to be sure it falls in line with the repo so I am creating the request early.
I will be working on 2) and 4) for the time being. |
After a quick look, I'm not sure we need that much complexity.
it would nice to have at least one common for the Box. After a quick look at your PR, I think it would be better to almost duplicate the buffer classes, because having too many branches (with the if else) is hard to follow. From now on, I think I will try to write comments in the PR only ;)
I can run it on my side if you want and push the results. But it would be better for you to setup black and isort at least (you can take a look at the makefile for the details of the commands). |
🚀 Feature
As mentioned in the RoadMap, adding dict/tuple support for observations is a planned feature. This follows from the OpenAI gym api which has Tuple and Dict as possible observation spaces.
Motivation
Currently, stablebaselines3 only supports one (image or a vector) observation. Extending this to Tuple/Dict observations would support for environments which have different inputs of data.
Current Plan
I plan on implementing this feature but I'd like to have some pointers on how to go about it.
Below is my current plan but I'd really like to verify it as a good way forward.
I think that I need to create a child class of
RolloutBufferSamples
which stores a list/dict of observations rather than a single observation.However, this may require adding a bool on the
rollout_buffer
itself so that the conversion to tensor (see on_policy_algorithm.py), can be performed over each element of the list/dict. Its not my favorite approach and I'd like to avoid it if possible.From here, I think that the other necessary changes would permeate through the repository:
torch_layers.py
that can take in multiple observations.util.py
andpreprocessing.py
to handle the new rollout typeIs this a good approach?
The text was updated successfully, but these errors were encountered: