-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Changing observation space during training #1157
Comments
Related: #1077 (comment) |
Following #1077 (comment), I would suggest to use a constant observation space size (equal to the largest possible observation). To do this, fill the inner observation (the one that varies in size) with zeros (or whatever values) to obtain a constant size outer observation (the one returned by step). You can also return in the info dict the associated mask. This way you stick to the paradigm of a gym environment whose observation and action space should not change. |
Thank you so much for this quick reply and your help! But for example, if there are 8 agents while the maximum number is 10. In this case, do you suggest that the observation for the last two agents to be all 0s? But the 0s would also be fed into the network in this case which would affect the training and testing. Do you mean to make use of the |
Yes.
It won't if you mask it properly, see https://ai.stackexchange.com/questions/22957/how-can-transformers-handle-arbitrary-length-input
The mask could be returned with the info dict, yes. |
Thank you! But I'm still a little bit confused as it seems that the training has been wrapped by Stable-baselines3 framework? If I have the mask returned by the info dict with I'm so sorry for keeping bothering you. Thank you again for your great help! |
Indeed, this will not work as is in SB3. You have to create your own feature extractor. Thinking about it, I advise you to use a And I think that's all you have to do. That's the easiest way in my opinion. |
If you manage to make this work, please share it here, it may help other people. |
Got it. And thank you so much! |
I have been using the CustomNetwork before and it worked well. Then I just tried the Besides modifying the observation in the gym env. I have also changed the I found there is an existing issue but it is not helpful as the problem could be solved by just using a |
As I mentioned before, I recommend that you use a custom feature extractor (instead of a custom network), as your need does not seem to require this level of customization. |
Thank you so much again for this quick reply. But in my case, it's better to use the custom network with an attention block (not a custom feature extractor building the layers sequentially). But I'll first try your suggested Multiple Inputs and Dictionary Observations and see how it works. But are these two able to combine with each other? |
In fact I realize that this sentence is not clear at all. Let me explain it better: |
Oh, I got this. Let me double-check: do you mean I first use a feature extractor to get the wanted observation based on the mask from the dictionary, then I input the masked observation to my custom policy network without any modification to obtain the policy and value? |
But is it possible to make use of them at the same time: use the feature extractor only for dealing with the observation and the custom policy network to compute the policy and value? Maybe this would work? |
Yes.
No. |
Thank you! Actually, the custom policy network is needed in my task (I need to use it to build an attention block not in a sequential manner, I consulted it before here). so I would keep the custom policy unchanged. I just tried this command to vary the number of agents at each episode:
May I have your advice? I'm so sorry for this inconvenience. And really really appreciate your great help! |
I think I'm at the end of what I can advise you, both in terms of knowledge and the time I can devote to it. Also, I think we're getting off track with this issue.
I think that you haven't understood masking. All tensor observations must have the same size, and they are associated with a mask. (This is where their "intrinsic length" is encoded.) So there are never observation tensors with various shape. I advise you to read on the subject, and eventually to get help by asking your question on the discord if you can't. |
Got it. Really appreciate your help! |
❓ Question
I have a question regarding changing the observation space during training.
As I'm using attention block to deal with a multi-agent task. While using attention could help me to easily vary the number for the agents, it seems Stable-baselines3 itself would report dimension error when changing (reset) the number of the landmarks (observation space). In this case, may I have your suggestions on how to achieve it? Thank you!
Checklist
The text was updated successfully, but these errors were encountered: