-
-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed wrappers.vector.RecordEpisodeStatistics
episode length computation from new autoreset api
#1018
Fixed wrappers.vector.RecordEpisodeStatistics
episode length computation from new autoreset api
#1018
Conversation
@@ -56,16 +60,41 @@ def test_record_episode_statistics(num_envs, env_id="CartPole-v1", num_steps=100 | |||
data_equivalence(wrapper_vector_reward, vector_wrapper_reward) | |||
data_equivalence(wrapper_vector_terminated, vector_wrapper_terminated) | |||
data_equivalence(wrapper_vector_truncated, vector_wrapper_truncated) | |||
data_equivalence(wrapper_vector_info, vector_wrapper_info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this isn't possible due to the episode time taken will vary between the two
This is the reason for the sequent code
A comment would be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strangely, this test did not fail for me. I will move it back to where it was originally. Not sure why I moved it in the first place.
wrapper_vector_time = wrapper_vector_info["episode"].pop("t") | ||
vector_wrapper_time = vector_wrapper_info["episode"].pop("t") | ||
assert wrapper_vector_time.shape == vector_wrapper_time.shape | ||
assert wrapper_vector_time.dtype == vector_wrapper_time.dtype | ||
|
||
data_equivalence(wrapper_vector_info, vector_wrapper_info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we pop t
, why can we not just do data_equivalence
for the two infos?
Why did this test fail previously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I messed up on the testing of this, it should be assert data_equivalence(...)
The new testing code should be
assert data_equivalence(wrapper_vector_obs, vector_wrapper_obs)
assert data_equivalence(wrapper_vector_reward, vector_wrapper_reward)
assert data_equivalence(wrapper_vector_terminated, vector_wrapper_terminated)
assert data_equivalence(wrapper_vector_truncated, vector_wrapper_truncated)
if "episode" in wrapper_vector_info:
assert "episode" in vector_wrapper_info
wrapper_vector_time_taken = wrapper_vector_info["episode"].pop("t")
vector_wrapper_time_taken = vector_wrapper_info["episode"].pop("t")
assert wrapper_vector_time_taken.shape == vector_wrapper_time_taken.shape
assert wrapper_vector_time_taken.dtype == vector_wrapper_time_taken.dtype
vector_wrapper_info["episode"].pop("_l")
vector_wrapper_info["episode"].pop("_r")
vector_wrapper_info["episode"].pop("_t")
assert data_equivalence(wrapper_vector_info, vector_wrapper_info)
This should now test it correctly.
I added max_episode_steps
to the make_vec
such that episode ends for this to actually be tested (update to main, this was previously broken)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that is why it did not fail when I moved the info dict comparison! Do you want me to fix the test, or are you going to do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update the test to my suggested code above, I think it is simplier than your code and achieves the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree that your version is simpler. For some reason, I thought the test did not consider the episode lengths at all, but after looking at it more carefully, I now understand what it actually does. I just changed the code accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it does that implicitly with the info check at the end
wrappers.vector.RecordEpisodeStatistics
episode length computation from new autoreset api
…ion from new autoreset api Farama-Foundation#1018
927e504
to
760aabc
Compare
I moved the info dict comparison back to its original place |
…ion from new autoreset api Farama-Foundation#1018
760aabc
to
0190ead
Compare
…ion from new autoreset api
0190ead
to
c0bc85c
Compare
@@ -118,10 +120,13 @@ def step( | |||
infos, dict | |||
), f"`vector.RecordEpisodeStatistics` requires `info` type to be `dict`, its actual type is {type(infos)}. This may be due to usage of other wrappers in the wrong order." | |||
|
|||
self.episode_returns += rewards | |||
self.episode_lengths += 1 | |||
self.episode_returns[self.prev_dones] = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative is self.episode_returns = np.where(self.prev_dones, self.episode_returns + rewards, 0)
However as this recreates the array each time I don't think this is a good idea
wrapper_vector_time = wrapper_vector_info["episode"].pop("t") | ||
vector_wrapper_time = vector_wrapper_info["episode"].pop("t") | ||
assert wrapper_vector_time.shape == vector_wrapper_time.shape | ||
assert wrapper_vector_time.dtype == vector_wrapper_time.dtype | ||
|
||
data_equivalence(wrapper_vector_info, vector_wrapper_info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it does that implicitly with the info check at the end
Description
Fixed wrappers.vector.RecordEpisodeStatistics episode length computation.
Fixes #1017
Type of change
Please delete options that are not relevant.
Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)